This dataset provides detailed information about the circumstances of personal injury road accidents in UK in 2015. The accidents were recorded using STATA19 accident reporting form by the police. The source of this dataset is Open Data UK.
The dataset has 285332 rows and 70 features. A detailed glossary which explains each feature is attached.
In this part, we will give an overview of the dataset. The summary function in R could give a brief view of the variables you are interested in:
summary(data_set[c("road_surface_conditions","accident_severity","weather_conditions")])
## road_surface_conditions accident_severity weather_conditions
## Min. :-1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 1.000 1st Qu.:3.000 1st Qu.:1.000
## Median : 1.000 Median :3.000 Median :1.000
## Mean : 1.292 Mean :2.837 Mean :1.495
## 3rd Qu.: 2.000 3rd Qu.:3.000 3rd Qu.:1.000
## Max. : 5.000 Max. :3.000 Max. :9.000
You can also plot them in histograms to visualize:
hist(data_set$road_surface_conditions,ylab="Count",xlab="Road Surface Conditions",main="Road Surface Counts")
Another useful tool in data visualization is seaborn package in Python. For example, if we want to explore the number of casualties conditioning on the road surface conditions:
sns.countplot(x="number_of_casualties",hue="accident_severity",data=data_set)