UK Car Accidents

17 Nov 2018

Content and Source of Data

This dataset provides detailed information about the circumstances of personal injury road accidents in UK in 2015. The accidents were recorded using STATA19 accident reporting form by the police. The source of this dataset is Open Data UK.

The dataset has 285332 rows and 70 features. A detailed glossary which explains each feature is attached.

Exploratory Data Analysis (EDA)

In this part, we will give an overview of the dataset. The summary function in R could give a brief view of the variables you are interested in:

summary(data_set[c("road_surface_conditions","accident_severity","weather_conditions")])
##  road_surface_conditions accident_severity weather_conditions
##  Min.   :-1.000          Min.   :1.000     Min.   :1.000     
##  1st Qu.: 1.000          1st Qu.:3.000     1st Qu.:1.000     
##  Median : 1.000          Median :3.000     Median :1.000     
##  Mean   : 1.292          Mean   :2.837     Mean   :1.495     
##  3rd Qu.: 2.000          3rd Qu.:3.000     3rd Qu.:1.000     
##  Max.   : 5.000          Max.   :3.000     Max.   :9.000

You can also plot them in histograms to visualize:

hist(data_set$road_surface_conditions,ylab="Count",xlab="Road Surface Conditions",main="Road Surface Counts")

Another useful tool in data visualization is seaborn package in Python. For example, if we want to explore the number of casualties conditioning on the road surface conditions:

sns.countplot(x="number_of_casualties",hue="accident_severity",data=data_set)