whatsapp

whatsApp

Have any Questions? Enquiry here!
☎ +91-9972364704 LOGIN BLOG
× Home Careers Contact
Back
Crime Prediction and Analysis Using Machine Learning
Crime Prediction and Analysis Using Machine Learning

 INTRODUCTION

Crimes are the significant threat to the humankind. There are many crimes that happens regular interval of time. Perhaps it is increasing and spreading at a fast and vast rate. Crimes happen from small village, town to big cities. Crimes are of different type – robbery, murder, rape, assault, battery, false imprisonment, kidnapping, homicide. Since crimes are increasing there is a need to solve the cases in a much faster way. The crime activities have been increased at a faster rate and it is the responsibility of police department to control and reduce the crime activities. Crime prediction and criminal identification are the major problems to the police department as there are tremendous amount of crime data that exist. There is a need of technology through which the case solving could be faster.

The aim of this project is to make crime prediction using the features present in the dataset. The dataset is extracted from the official sites. With the help of machine learning algorithm, using python as core we can predict the type of crime which will occur in a particular area.

CONCEPTS OF THE PROPOSED SYSTEM

Predictive Modeling

Predictive modeling is the way of building a model that is capable of making predictions. The process includes a machine learning algorithm that learns certain properties from a training dataset in order to make those predictions. Predictive modeling can be divided further into two areas: Regression and pattern classification. Regression models are based on the analysis of relationships between variables and trends in order to make predictions about continuous variables. 

Types of Predictive Models Algorithms

Classification and Decision Trees
A decision tree is an algorithm that uses a tree shaped graph or model of decisions including chance event outcomes, costs, and utility. It is one way to display an algorithm.

Naive Bayes -In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes theorem with independence assumptions between the features. The technique constructs classifier models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set.

Linear Regression – The analysis is a statistical process for estimating the relationships among variables. Linear regression is an approach for modelling the relationship between a scalar dependent variable Y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple linear regression. More than one variable is called multivariate. Logistic Regression - In statistics, logistic regression, is a regression model where the dependent variable is categorical or binary. 

Data Preprocessing

This process includes methods to remove any null values or infinite values which may affect the accuracy of the system. The main steps include
Formatting, cleaning and sampling.
Cleaning process is used for removal or fixing of some missing data there may be data that are incomplete. Sampling is the process where appropriate data are used which may reduce the running time for the algorithm. Using python, the preprocessing is done. 

Functional Diagram of Proposed Work

It can be divided into 4 parts:

1. Descriptive analysis on the Data

2. Data treatment (Missing value and outlier fixing)


 3. Data Modelling


 4. Estimation of performance 


Prepare Data

1. In this step we need prepare data into right format for analysis 


2. Data cleaning 
 Analyze and Transform Variables
We may need to transform the variables using one of the approaches

                    1. Normalization or standardization

                   
 2. Missing Value Treatment 


Random Sampling (Train and Test)

• Training Sample: Model will be developed on this sample. 70% or 80% of the data goes here. 


• Test Sample: Model performances will be validated on this sample. 30% or 20% of the data goes here 


Model Selection

Based on the defined goal(s) (supervised or unsupervised) we have to select one of or combinations of modeling techniques. Such as

• KNN Classification 


• Logistic Regression 


• Decision Trees 


• Random Forest 


• Support Vector Machine (SVM) 


• Bayesian methods Build/Develop/Train Models Validate the assumptions of the chosen algorithm  Develop/Train Model on Training Sample, which is the available data(Population) 


 Check Model performance - Error, Accuracy

Validate/Test Model

 Score and Predict using Test Sample

 Check Model Performance: Accuracy etc

IMPLEMENTATION

The dataset used in this project is taken from Kaggle.com. The dataset obtained from kaggle is maintained and updated by the Chicago police department. The implementation of this project is divided into following steps –

Data collection

Crime dataset from kaggle is used in CSV format.

Data Preprocessing

10k entries are present in the dataset. The null values are removed using df = df.dropna() where df is the data frame. The categorical attributes (Location, Block, Crime Type, Community Area) are converted into numeric using Label Encoder. The date attribute is splitted into new attributes like month and hour which can be used as feature for the model.

Feature selection

Features selection is done which can be used to build the model. The attributes used for feature selection are Block, Location, District, Community area, X coordinate , Y coordinate, Latitude , Longitude, Hour and month,

 Building and Traning Model After

feature selection location and month attribute are used for training. The dataset is divided into pair of xtrain ,ytrain and xtest, y test. The algorithms model is imported form skleran. Building model is done using model. Fit (xtrain, ytrain).

Prediction

After the model is build using the above process, prediction is done using model.predict(xtest). The accuracy is calculated using accuracy_score imported from metrics - metrics.accuracy_score (ytest, predicted). 

Visualization

Using mathpoltlib library from sklearn. Analysis of the crime dataset is done by plotting various graphs.

As we can see from the results obtained from the table the algorithm which can be used for the predictive modeling will be KNN algorithms with accuracy of 0.787 highest among the rest of the algorithm.

The least which can be used will be SVM.
For further modelling using unseen data there is no need for using other algorithm.

Crime Visualization

This section deals with the analysis done on the dataset and plotting them into various graphs like bar, pie, scatter.
Analysis done are

5. Types of crimes committed over Time (Month/ Hour).


 6. No of crimes of all types of crime over the whole city of Chicago. 


7. Arrested ratio.


 8. Crimes committed across different location. 


9. Details of Major crimes committed in the city.

CONCLUSION 

With the help of machine learning technology, it has become easy to find out relation and patterns among various data’s. The work in this project mainly revolves around predicting the type of crime which may happen if we know the location of where it has occurred. Using the concept of machine learning we have built a model using training data set that have undergone data cleaning and data transformation. The model predicts the type of crime with accuracy of 0.789. Data visualization helps in analysis of data set. The graphs include bar, pie, line and scatter graphs each having its own characteristics. We generated many graphs and found interesting statistics that helped in understanding Chicago crimes datasets that can help in capturing the factors that can help in keeping society safe. 

 

 

Note : Find the best solution for electronics components and technical projects ideas
keep in touch with our social media links as mentioned below
Mifratech websites : https://www.mifratech.com/public/
Mifratech facebook : https://www.facebook.com/mifratech.lab
mifratech instagram : https://www.instagram.com/mifratech/
mifratech twitter account : https://twitter.com/mifratech

Popular Coures