ESG Exploratory Data Analysis

#INTRODUCTION

#What is ESG?

#“Environmental, social, and governance (ESG) investing refers to a set of standards for a company’s behavior used by socially conscious investors to screen potential investments. Environmental criteria consider how a company safeguards the environment, including corporate policies addressing climate change, for example. Social criteria examine how it manages relationships with employees, suppliers, customers, and the communities where it operates. Governance deals with a company’s leadership, executive pay, audits, internal controls, and shareholder rights.” (Investopedia)

#In this data analysis project, we’re going to explore the ESG scores of key sectors and industries. The dataset includes 245 companies operating worldwide in different sectors. I collected the data used in the research from the Yahoo Finance website randomly. All ESG ratings are provided by Sustainalytics. In this exploratory data analysis, I’ll try to provide the reader with the key insights on different sectors with high and low ESG risk scores. I’ll also disaggregate the ESG risk scores by examining their individual environmental, social and governance risk scores. The dataset includes:

#Environmental Risk Scores #Social Risk Scores #Governance Risk Scores #Total ESG Risk Score = Environmental + Social + Governance Risk Score #ESG Score Group: High, Medium and Low #Controversy Points/Level: None, Low Controversy, Moderate, Significant, High, Severe #Sector #Industry (Sub-sectors)

#METHODOLOGY

#I collected the data from the Yahoo Finance using ESG Risk Scores (Randomly chosen 250 global companies in several sectors and industries - Data from: August 2022)

#INSTALLING THE NECESSARY PACKAGES

library(readxl)

## Warning: package 'readxl' was built under R version 4.2.3

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.3

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'purrr' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'stringr' was built under R version 4.2.3

## Warning: package 'forcats' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggvis)

## Warning: package 'ggvis' was built under R version 4.2.3

## 
## Attaching package: 'ggvis'
## 
## The following object is masked from 'package:ggplot2':
## 
##     resolution

library(dplyr)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following objects are masked from 'package:ggvis':
## 
##     add_data, hide_legend
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(lubridate)
library(ggthemes)

## Warning: package 'ggthemes' was built under R version 4.2.3

library(tidyr)
library(viridis)

## Loading required package: viridisLite

#LOADING THE DATASET AND DATA EXPLORATION

#Loading the dataset
esg_data <- read_excel("ESG Data-Final.xlsx")

#Check a few value from the dataset
head(esg_data)

## # A tibble: 6 × 11
##   company_name            env_risk_score social_risk_score governance_risk_score
##   <chr>                            <dbl>             <dbl>                 <dbl>
## 1 Alphabet Inc.                      1.7              11.1                  11.4
## 2 Apple Inc.                         0.6               6.9                   9.2
## 3 Exxon Mobil Corporation           18.5               9.8                   8.1
## 4 Bank of America Corpor…            1.6              14.4                  11.2
## 5 The Walt Disney Company            0.1               7.9                   6.9
## 6 Walmart Inc.                       4.4              13.9                   6.3
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#STRUCTURE AND CLASS

#Checking the structure of the dataset and some key features of the variables
str(esg_data)

## tibble [245 × 11] (S3: tbl_df/tbl/data.frame)
##  $ company_name         : chr [1:245] "Alphabet Inc." "Apple Inc." "Exxon Mobil Corporation" "Bank of America Corporation" ...
##  $ env_risk_score       : num [1:245] 1.7 0.6 18.5 1.6 0.1 4.4 5.6 0 0 0.6 ...
##  $ social_risk_score    : num [1:245] 11.1 6.9 9.8 14.4 7.9 13.9 14.8 16.4 11.6 3.8 ...
##  $ governance_risk_score: num [1:245] 11.4 9.2 8.1 11.2 6.9 6.3 9.9 6.4 5.9 11.6 ...
##  $ total_esg_risk_score : num [1:245] 24 17 36 27 15 25 30 23 18 16 ...
##  $ esg_score_group      : chr [1:245] "Medium" "Low" "High" "Medium" ...
##  $ controversy_points   : num [1:245] 4 3 3 3 2 4 3 3 3 1 ...
##  $ controversy_level    : chr [1:245] "High" "Significant" "Significant" "Significant" ...
##  $ peer_avg             : num [1:245] 1.5 1.5 2.1 2.2 1.4 2.2 1.9 1.9 1.9 1.6 ...
##  $ sector               : chr [1:245] "Communication Services" "Technology" "Energy" "Financial Services" ...
##  $ industry             : chr [1:245] "Internet Content & Information" "Consumer Electronics" "Oil & Gas Integrated" "Banks-Diversified" ...

#Class
class(esg_data) #It's a data frame.

## [1] "tbl_df"     "tbl"        "data.frame"

#COLUMNS AND NUMBER OF OBSERVATIONS

#Column Names 
colnames(esg_data)

##  [1] "company_name"          "env_risk_score"        "social_risk_score"    
##  [4] "governance_risk_score" "total_esg_risk_score"  "esg_score_group"      
##  [7] "controversy_points"    "controversy_level"     "peer_avg"             
## [10] "sector"                "industry"

#The number of observations
dim(esg_data) #The data contains 11 different columns/variables with 245 different rows/observations.

## [1] 245  11

#NULL VALUES

#Checking the data if there are missing/null values

is.null(esg_data) #There's no null values in the dataset.

## [1] FALSE

#Another way to see whether there is a null value

colSums(sapply(esg_data, is.na))

##          company_name        env_risk_score     social_risk_score 
##                     0                     0                     0 
## governance_risk_score  total_esg_risk_score       esg_score_group 
##                     0                     0                     0 
##    controversy_points     controversy_level              peer_avg 
##                     0                     0                     0 
##                sector              industry 
##                     0                     0

#ASSIGNING THE VARIABLES

#Let’s assign names to variables we’re going to use

env_risk <- esg_data$env_risk_score
social_risk <- esg_data$social_risk_score
gov_risk <- esg_data$governance_risk_score
total_risk <- esg_data$total_esg_risk_score
sector <- esg_data$sector
industry <- esg_data$industry
risk_group <- esg_data$esg_score_group
controversy_points <- esg_data$controversy_points
peer_avg <- esg_data$peer_avg

#SUMMARY STATISTICS

#Summary statistics of environmental, social and governance risk scores as well as total risk scores
summary(esg_data[2:5])

##  env_risk_score   social_risk_score governance_risk_score total_esg_risk_score
##  Min.   : 0.000   Min.   : 1.300    Min.   : 3.200        Min.   : 6.00       
##  1st Qu.: 1.600   1st Qu.: 7.000    1st Qu.: 5.500        1st Qu.:17.00       
##  Median : 4.400   Median : 9.000    Median : 6.700        Median :23.00       
##  Mean   : 6.311   Mean   : 9.434    Mean   : 7.249        Mean   :22.99       
##  3rd Qu.: 9.700   3rd Qu.:11.800    3rd Qu.: 8.400        3rd Qu.:28.00       
##  Max.   :25.300   Max.   :21.000    Max.   :15.500        Max.   :55.00

#Q&A: What is the lowest and highest risk scores in each group?

#Environmental ==> Highest: 25.3 Lowest: 0. #Social ==> Highest: 21.0 Lowest: 1.3. #Governance ==> Highest: 15.5 Lowest: 3.2. #Total ==> Highest: 55 Lowest: 6.0

#VISUALIZING THE RISK SCORES

#Let’s visualize the environmental, social and governance risk scores using histogram

#Histogram of the environmental risk scores
hist(env_risk, breaks = 15) #Right skewed

#Histogram of the social risk scores
hist(social_risk, breaks = 15) #It looks our sample of social risk scores has a normal distribution

#Histogram of the governance risk scores
hist(gov_risk, breaks = 15) #Right skewed

#Histogram of the total risk scores
hist(total_risk, breaks = 15)

#Line Charts

plot(density(env_risk), main = "Environmental Risk Scores")

plot(density(social_risk), main='Social Risk Scores')

plot(density(gov_risk), main='Governance Risk Scores')

plot(density(total_risk), main='Total Risk Scores')

#How many unique sectors are there in the data set?
n_distinct(sector) #11 unique sectors

## [1] 11

#How many unique industries are there in the data set?
n_distinct(industry) #87 unique sectors

## [1] 87

#Levels in the sectorS
unique(esg_data$sector) #We have 11 unique sectors, namely; 1)"Communication Services 2) Technology, 3) Energy 4)Financial Services 5) Consumer Defensive 6) Consumer Cyclical 7) Healthcare 8) Real Estate 9) Industrials 10) Basic Materials 11)Utilities

##  [1] "Communication Services" "Technology"             "Energy"                
##  [4] "Financial Services"     "Consumer Defensive"     "Consumer Cyclical"     
##  [7] "Healthcare"             "Real Estate"            "Industrials"           
## [10] "Basic Materials"        "Utilities"

#Levels in the industries
unique(esg_data$industry) #We have 87 unique industries (sub-sectors).

##  [1] "Internet Content & Information"       
##  [2] "Consumer Electronics"                 
##  [3] "Oil & Gas Integrated"                 
##  [4] "Banks-Diversified"                    
##  [5] "Entertainment"                        
##  [6] "Discount Stores"                      
##  [7] "Internet Retail"                      
##  [8] "Healthcare Plans"                     
##  [9] "Insurance-Diversified"                
## [10] "Medical Distribution"                 
## [11] "Telecom Services"                     
## [12] "Software-Infrastructure"              
## [13] "Home Improvement Retail"              
## [14] "REIT-Mortgage"                        
## [15] "Staffing & Employment Services"       
## [16] "Pharmaceutical Retailers"             
## [17] "Packaged Foods"                       
## [18] "Auto Manufacturers"                   
## [19] "Aerospace & Defense"                  
## [20] "Household & Personal Products"        
## [21] "Specialty Industrial Machinery"       
## [22] "Luxury Goods"                         
## [23] "Restaurants"                          
## [24] "Specialty Chemicals"                  
## [25] "Chemicals"                            
## [26] "Grocery Stores"                       
## [27] "Farm & Heavy Construction Machinery"  
## [28] "Farm Products"                        
## [29] "Banks-Regional"                       
## [30] "Real Estate-Development"              
## [31] "Semiconductors"                       
## [32] "Tobacco"                              
## [33] "Food Distribution"                    
## [34] "Beverages-Non-Alcoholic"              
## [35] "Beverages-Brewers"                    
## [36] "Resorts & Casinos"                    
## [37] "Drug Manufacturers-General"           
## [38] "Confectioners"                        
## [39] "Travel Services"                      
## [40] "Information Technology Services"      
## [41] "Credit Services"                      
## [42] "Software-Application"                 
## [43] "Footwear & Accessories"               
## [44] "Asset Management"                     
## [45] "Electronic Gaming & Multimedia"       
## [46] "Mortgage Finance"                     
## [47] "Oil & Gas E&P"                        
## [48] "Specialty Business Services"          
## [49] "Waste Management"                     
## [50] "Medical Services"                     
## [51] "Apparel Retail"                       
## [52] "Insurance-Life"                       
## [53] "Airlines"                             
## [54] "Oil & Gas Equipment & Services"       
## [55] "Leisure"                              
## [56] "Integrated Freight & Logistics"       
## [57] "REIT-Office"                          
## [58] "Real Estate Services"                 
## [59] "Insurance Brokers"                    
## [60] "Financial Data & Stock Exchanges"     
## [61] "Publishing"                           
## [62] "REIT-Industrial"                      
## [63] "REIT-Specialty"                       
## [64] "Medical Care Facilities"              
## [65] "Medical Instruments & Supplies"       
## [66] "REIT-Healthcare Facilities"           
## [67] "Communication Equipment"              
## [68] "Computer Hardware"                    
## [69] "Education & Training Services"        
## [70] "Diagnostics & Research"               
## [71] "Utilities-Regulated Electric"         
## [72] "Utilities-Diversified"                
## [73] "Utilities—Renewable"                  
## [74] "Utilities—Independent Power Producers"
## [75] "Utilities-Regulated Gas"              
## [76] "Utilities—Diversified"                
## [77] "Conglomerates"                        
## [78] "Capital Markets"                      
## [79] "Railroads"                            
## [80] "Biotechnology"                        
## [81] "Oil & Gas Refining & Marketing"       
## [82] "Thermal Coal"                         
## [83] "Gold"                                 
## [84] "Copper"                               
## [85] "Agricultural Inputs"                  
## [86] "Steel"                                
## [87] "Other Industrial Metals & Mining"

#DATA VISUALIZATION AND ANALYSIS

#TOTAL ESG RISK SCORES BY SECTOR

#Calculating the average total ESG risk score per sector
esg_sector <- esg_data %>%
  group_by(sector) %>%
  summarise(avg_esg_risk_score = mean(total_esg_risk_score, na.rm = TRUE)) %>%
  arrange(avg_esg_risk_score)

#Adding a ranking variable
esg_sector <- esg_sector %>%
  mutate(sector_rank = row_number())

#Creating a horizontal bar chart with sector rankings
ggplot(esg_sector, aes(x = reorder(sector, avg_esg_risk_score), y = avg_esg_risk_score)) +
  geom_bar(stat = "identity", fill = "brown") +
  coord_flip() +
  geom_text(aes(label = round(avg_esg_risk_score, 2)), vjust = 0.5, hjust = 1.1, color = "white", size = 3) +
  xlab("Sectors") +
  ylab("Average Total ESG Risk Score") +
  ggtitle("Average Total ESG Risk Score Ranking by Sector") +
  theme_minimal()

#As seen in the bar chart, energy is the sector with the highest ESG risk scores. And it is followed by the basic materials and utilities sectors.

#TOTAL ESG RISK SCORES BY INDUSTRY

#Calculating the average total ESG risk score per industry
esg_industry <- esg_data %>%
  group_by(industry) %>%
  summarise(avg_esg_risk_score = mean(total_esg_risk_score, na.rm = TRUE)) %>%
  arrange(avg_esg_risk_score)

#Ranking the industries
esg_industry <- esg_industry %>%
  mutate(industry_rank = row_number()) %>%
  tail(20)

#Creating a bar chart with industry rankings
ggplot(esg_industry, aes(x = reorder(industry, avg_esg_risk_score), y = avg_esg_risk_score)) +
  geom_bar(stat = "identity", fill = "purple") +
  coord_flip() +
  geom_text(aes(label = round(avg_esg_risk_score, 2)), vjust = 0.5, hjust = 1.1, color = "white", size = 3) +
  xlab("Sectors") +
  ylab("Average Total ESG Risk Score") +
  ggtitle("Average Total ESG Risk Score Ranking by Sector") +
  theme_minimal()

#Q&A: Which industry has the highest ESG risk score? It’s oil&gas industry

#TOTAL ESG RISK SCORES BY INDUSTRY

#Calculating the average total ESG risk score per industry
esg_industry <- esg_data %>%
  group_by(industry) %>%
  summarise(avg_esg_risk_score = mean(total_esg_risk_score, na.rm = TRUE)) %>%
  arrange(avg_esg_risk_score)

#Ranking the industries
esg_industry <- esg_industry %>%
  mutate(industry_rank = row_number()) %>%
  head(20)

#Creating a bar chart with industry rankings
ggplot(esg_industry, aes(x = reorder(industry, avg_esg_risk_score), y = avg_esg_risk_score)) +
  geom_bar(stat = "identity", fill = "purple") +
  coord_flip() +
  geom_text(aes(label = round(avg_esg_risk_score, 2)), vjust = 0.5, hjust = 1.1, color = "white", size = 3) +
  xlab("Sectors") +
  ylab("Average Total ESG Risk Score") +
  ggtitle("Average Total ESG Risk Score Ranking by Sector") +
  theme_minimal()

#Q&A: Which industry has the lowest ESG risk score? Publishing

#COMPANY RANKING

#The company with the highest total ESG risk score
highest_risks <- esg_data %>%
  top_n(5, total_esg_risk_score) %>%
  arrange(desc(total_esg_risk_score))

#Result
print(highest_risks) #It's PetroChina Company Limited (ESG Risk Score: 55)

## # A tibble: 6 × 11
##   company_name            env_risk_score social_risk_score governance_risk_score
##   <chr>                            <dbl>             <dbl>                 <dbl>
## 1 PetroChina Company Lim…           22.7              19.4                  12.9
## 2 China Petroleum & Chem…           19.7              19.5                  13.1
## 3 Yankuang Energy Group …           25.3              16.3                   6.3
## 4 Occidental Petroleum C…           24.6              12                     6.6
## 5 Chesapeake Energy Corp…           19.7              12.4                  10.3
## 6 CNOOC Limited                     22.1              10.6                  10.6
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#Which company has the lowest ESG risk score?

#The company with the lowest total ESG risk score
lowest_risk <- esg_data %>%
  arrange(total_esg_risk_score) %>%
  slice(1:5)

#Result
print(lowest_risk) #It's Pearson plc (ESG Risk Score: 6)

## # A tibble: 5 × 11
##   company_name            env_risk_score social_risk_score governance_risk_score
##   <chr>                            <dbl>             <dbl>                 <dbl>
## 1 Pearson plc                        0                 1.9                   4  
## 2 CBRE Group, Inc.                   1.3               2.5                   3.2
## 3 Thomson Reuters Corpor…            0.1               3.2                   5.9
## 4 Prologis, Inc.                     2.5               1.7                   4.3
## 5 Accenture plc                      0.3               4.6                   4.8
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#ENVIRONMENTAL RISK SCORES

#Let’s investigate the environmental risk scores

#The top 10 companies with the highest environmental risk scores
highest_env_risk <- esg_data %>%
  arrange(desc(env_risk_score)) %>%
  slice(1:10)

#The 10 companies with the lowest environmental risk scores
lowest_env_risk <- esg_data %>%
  arrange(env_risk_score) %>%
  slice(1:10)

#Results-Highest Environmental Risk Scores
print("Top 10 companies with the highest environmental risk scores are:")

## [1] "Top 10 companies with the highest environmental risk scores are:"

print(highest_env_risk)

## # A tibble: 10 × 11
##    company_name           env_risk_score social_risk_score governance_risk_score
##    <chr>                           <dbl>             <dbl>                 <dbl>
##  1 Yankuang Energy Group…           25.3              16.3                   6.3
##  2 Coal India Limited               24.8              11.9                   5.9
##  3 Occidental Petroleum …           24.6              12                     6.6
##  4 PT United Tractors Tbk           22.8              12.1                   6.5
##  5 PetroChina Company Li…           22.7              19.4                  12.9
##  6 CNOOC Limited                    22.1              10.6                  10.6
##  7 EOG Resources, Inc.              20.4               8.1                   7.7
##  8 China Petroleum & Che…           19.7              19.5                  13.1
##  9 Chesapeake Energy Cor…           19.7              12.4                  10.3
## 10 Pioneer Natural Resou…           18.7               9                     9.1
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#Results-Lowest Environmental Risk Scores
print(lowest_env_risk)

## # A tibble: 10 × 11
##    company_name           env_risk_score social_risk_score governance_risk_score
##    <chr>                           <dbl>             <dbl>                 <dbl>
##  1 CVS Health Corporation            0                16.4                   6.4
##  2 UnitedHealth Group In…            0                11.6                   5.9
##  3 Randstad N.V.                     0                 7.1                   3.7
##  4 Adecco Group AG                   0                 8.4                   3.3
##  5 Pearson plc                       0                 1.9                   4  
##  6 Cisco Systems, Inc.               0                 5.6                   6.4
##  7 Elevance Health Inc.              0                 5.8                   5.7
##  8 Gilead Sciences, Inc.             0                14                     8  
##  9 Cigna Corporation                 0                 6                     5.7
## 10 The Walt Disney Compa…            0.1               7.9                   6.9
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#SOCIAL RISK SCORES

#The top 10 companies with the highest social risk scores
highest_soc_risk <- esg_data %>%
  arrange(desc(social_risk)) %>%
  slice(1:10)

#The 10 companies with the lowest social risk scores
lowest_soc_risk <- esg_data %>%
  arrange(social_risk) %>%
  slice(1:10)

#Results-Highest Social Risk Scores

print("Q&A: Which companies have high social risk scores? Top 10 companies with the highest social risk scores are:")

## [1] "Q&A: Which companies have high social risk scores? Top 10 companies with the highest social risk scores are:"

print(highest_soc_risk)

## # A tibble: 10 × 11
##    company_name           env_risk_score social_risk_score governance_risk_score
##    <chr>                           <dbl>             <dbl>                 <dbl>
##  1 Meta Platforms, Inc.              1                21                    10.5
##  2 The Boeing Company                7.1              19.7                   7.9
##  3 China Petroleum & Che…           19.7              19.5                  13.1
##  4 PetroChina Company Li…           22.7              19.4                  12.9
##  5 Caterpillar Inc.                  7.2              18.5                   8.5
##  6 Industrial and Commer…            2.3              18.5                  15.2
##  7 Bayer Aktiengesellsch…            4                18.5                   7.3
##  8 Eli Lilly and Company             3.4              17.4                  11.6
##  9 HCA Healthcare, Inc.              3.5              17.3                   7.4
## 10 AbbVie Inc.                       1.1              16.8                   9.9
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#Q&A: Which industries have high social risk scores? Internet Content & Information, Aerospace&Defense, Oil&Gas, Drug Manufacturers.

#Q&A: Which sectors have high social risk scores? Industrials, Energy, Healthcare

#Results-Lowest Social Risk Scores
print("#Q&A: Which companies have low social risk scores? The 10 companies with the lowest social risk scores are:")

## [1] "#Q&A: Which companies have low social risk scores? The 10 companies with the lowest social risk scores are:"

print(lowest_soc_risk)

## # A tibble: 10 × 11
##    company_name           env_risk_score social_risk_score governance_risk_score
##    <chr>                           <dbl>             <dbl>                 <dbl>
##  1 Air Products and Chem…            5.9               1.3                   3.6
##  2 Prologis, Inc.                    2.5               1.7                   4.3
##  3 Pearson plc                       0                 1.9                   4  
##  4 Braskem S.A.                     15.6               1.9                   8.4
##  5 CBRE Group, Inc.                  1.3               2.5                   3.2
##  6 Lotte Chemical Corpor…           14.8               2.7                   9.8
##  7 Crown Castle Inc.                 5.2               2.9                   5.1
##  8 Thomson Reuters Corpo…            0.1               3.2                   5.9
##  9 SBA Communications Co…            3.1               3.2                   5.2
## 10 AGNC Investment Corp.             7.4               3.5                   6.2
## # ℹ 7 more variables: total_esg_risk_score <dbl>, esg_score_group <chr>,
## #   controversy_points <dbl>, controversy_level <chr>, peer_avg <dbl>,
## #   sector <chr>, industry <chr>

#A SPECIAL FOCUS ON THE ENERGY SECTOR

#Energy sector has the highest ESG risk scores in total. The main factor leading to a high ESG risk score is having a high environmental risk score. We need to further examine the sector and the companies operating in this field.

#Filtering the data based on the energy sector
energy_data <- esg_data %>%
  filter(sector == "Energy")

#Sorting and ranking
energy_rank <- energy_data %>%
  arrange(total_esg_risk_score) %>%
  mutate(energy_rank = row_number())

#Bar Chart
ggplot(energy_rank, aes(x = reorder(company_name, total_esg_risk_score), y = total_esg_risk_score)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  geom_text(aes(label = total_esg_risk_score), vjust = 0.5, hjust = 1.1, color = "white", size = 3) +
  xlab("Companies") +
  ylab("Total ESG Risk Score") +
  ggtitle("Total ESG Risk Score Comparison in the Energy Sector") +
  theme_minimal()

#The company having the highest ESG risk score is PetroChina Company Limited with a score of 55. And it is followed by China Petroleum Corporation and Yankuang Energy Group Company. The first three companies with the highest ESG risk scores are of Chinese origin.

#The main factor leading to a high ESG risk score is having a high environmental risk score. So, to test this hypothesis, let’s check the environmental risk scores of the energy companies.

#Bar chart to visualize the environmental risk scores in the energy sector
ggplot(energy_data, aes(x = reorder(company_name, env_risk_score), y = env_risk_score)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  geom_text(aes(label = round(env_risk_score, 2)), hjust = -0.5, size = 3) +
  labs(x = "Company", y = "Environmental Risk Score", title = "Environmental Risk Scores of Energy Companies") +
  theme_minimal()

#According to the bar chart, the energy field, with a few exemptions, has high environmental risk scores.

lin.model1 <- lm(gov_risk ~ controversy_points)
summary(lin.model1)

## 
## Call:
## lm(formula = gov_risk ~ controversy_points)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.457 -1.856 -0.357  1.344  7.343 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          5.4526     0.3633   15.01  < 2e-16 ***
## controversy_points   0.8015     0.1468    5.46 1.17e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.415 on 243 degrees of freedom
## Multiple R-squared:  0.1093, Adjusted R-squared:  0.1056 
## F-statistic: 29.81 on 1 and 243 DF,  p-value: 1.173e-07

#ENVIRONMENTAL, SOCIAL AND GOVERNANCE RISK SCORES BASED ON SECTORS

#In this section, I’ll try to examine which sectors have the highest and lowest risk scores with box plots.

#ENVIRONMENTAL

ggplot(esg_data, aes(y = env_risk, x = sector)) +
  geom_boxplot()

#Highest average environmental risk score: Energy #Highest individual environmental risk score: Energy

#SOCIAL

ggplot(esg_data, aes(y = social_risk , x = sector)) +
  geom_boxplot()

#Highest average social risk score: Healthcare #Highest individual social risk score: Communication Services

#GOVERNANCE

ggplot(esg_data, aes(y = gov_risk, x = sector)) +
  geom_boxplot()

#Highest average governance risk score: Financial Services #Highest individual governance risk score: Financial Services

ESG Exploratory Data Analysis

Oguzhan Gurbuz

2023-01-27