USED CAR PRICE PREDICTION

Isaac Raja
6 min readMar 26, 2021

--

ABSTRACT

In this product Cars are being sold more than ever. Developing countries adopt the lease culture instead of buying a new car due to affordability. Therefore, the rise of used cars sales is exponentially increasing.

Car sellers sometimes take advantage of this scenario by listing unrealistic prices owing to the demand.

Therefore, arises a need for a model that can assign a price for a vehicle by evaluating its features taking the prices of other cars into consideration.

INTRODUTION

The prices of new cars in the industry is fixed by the manufacturer with some additional costs incurred by the Government in the form of taxes. so customers buying a new car can be assured of the money they invest to be worthy. But due to the increased price of new cars and the incapability of customers to buy new cars due to the lack of funds, used cars sales are on a global increase. Predicting the prices of used cars is an interesting and much-needed problem to be addressed.

Customers can be widely exploited by fixing unrealistic prices for the used cars and many falls into this trap. Therefore, rises an absolute necessity of a used car price prediction system to effectively determine the worthiness of the car using a variety of features.

Requirements of Tools and Libraries in Proposed system:

Notebook Editor (Jupyter or Spyder)

Python Programming

Scikit Learn-Machine Learning-Supervised Learning

Visualization in Tableau

Flask-Web application

DATA COLLECTION

Description of the Data

The data used in this project was downloaded from Kaggle. It was uploaded on Kaggle by Austin Reese who Kaggle.com user. Austin Reese scraped this data from craigslist with non-profit purpose. It contains most all relevant information that provides on car sales including columns like price, condition, manufacturer, and 10 other categories.

Used Car Dataset has 10 attributes, they are

1. Name

2. Year

3. Transmission

4. Owner Type

5. Present price

6. Selling price

7. Fuel type

8. Seller type

Name :

Brand of the car

Year :

Registration Year

Transmission :The car is whether Automatic or Manual

Owner type :

Single owner or Second hand

Mileage :

How many kmpl the car gives

Fuel :

Type of the fuel

Seller type :

The seller is individual or not

Price :

The Price of the Car

Source and Methods of Collecting Data

In predict USED CAR PRICE the dataset is collected from kaggle .

Data Source — https://www.kaggle.com/avikasliwal/used-cars-price-prediction

Preprocessing and Feature Selection Steps

Handling Null values:

This Proposed system has some null values but here use the filling method because if suppose drop null values that are not an efficient way of future prediction so that here proposed system using fill method with the help of pandas.

Handling Categorical data:

In this Proposed system has commodity and date column had object type but proposed system convert date column with help of date time function another one important feature is commodity that columns has three columns

Once the null values are filled values then apply (pd.get_dummies) to split the columns into category wise with 0s and 1s for prediction .

And the dataset is shaped for prediction, the values of each columns is checked.

df6 = pd.get_dummies(df5,columns = [‘Fuel_Type’,’Transmission’,’Owner_Type’])

Splitting :

The Dataset is split as X and Y.

X is train data and Y is test data.

The RANDOM FORREST REGRESSOR and LINEAR REGRESION MODEL algorithms are used for better accuracy

In future using different algorithms grab the better accuracy.

CODE:

x=df.drop(‘Selling_Price’,axis=1)

y=df[‘Selling_Price’]

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=9)

Model Architecture

RANDOM FORREST REGRESSOR:

A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

An End easily use my set up and Reproduce it Because before modeling all the NULL values are removed with proper and TRANSPARENT code

In This Propose the BARPLOT charts are used to categorize the dataset for the naive user.

LINEAR REGRESSOR:

Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task.

Regression models target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting.

Ridge

A Ridge regressor is basically a regularized version of Linear Regressor. The original cost function of linear regressor we add a regularized term which forces the learning algorithm to fit the data and helps to keep the weights lower as possible.

Lasso

Lasso Regression is similar to Ridge regression except here we add Mean Absolute value of coefficients in place of mean square value. Unlike Ridge Regression, Lasso regression can completely eliminate the variable by reducing its coefficient value to 0.

Training Overview

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=9)

MODEL SYNTAX:

from sklearn.ensemble import RandomForestRegressor

The Random forrest regressor how fitting the x_train and y_train and prediction in the model.The below picture shown the (MAE,MSE,RMSE) error accuracy using Random forrest regressor.

from sklearn.linear_model import LinearRegression

lr=LinearRegression()

lr.fit(x_train,y_train)

ACCURACY :

The below result is predict by this propose similarly using different models having a better error accuracy value.

LINEAR REGRESSOR LASSO

ACCURACY

MAE (mean absolute error)

1.334%

MSE (mean squared error)

4.4731%

RMSE (Root mean square error)

2.114%

RANDOM FORREST REGRESSOR

ACCURACY

MAE (mean absolute error)

0.5378%

MSE (mean squared error)

0.8004%

RMSE (Root mean square error)

0.8946%

LINEAR REGRESSOR RIDGE

ACCURACY

MAE (mean absolute error)

1.133%

MSE (mean squared error)

3.2738%

RMSE (Root mean square error)

1.809%

Deployment Process

When the product is ready to run local server then obviously go to deploy the project because after deploy the product user can easily used.

In this proposed system done by local server then upload the code and files in github when github is done .

Local Server running:

In local server running we can run localhost Running on (http://127.0.0.1:5000/)

CONCLUSION

Summary

This paper evaluates used-car price prediction using Kaggle dataset which gives an accuracy of 20% for test data and 80% for train-data.

The most relevant features used for this prediction are price, kilometer, and Transmission Type by filtering out outliers and irrelevant features of the dataset.

Being a sophisticated model, Random Forest gives good accuracy in comparison to prior work using these datasets.

Limitations and Future Work

Keeping the current model as a baseline, we intend to use some advanced techniques like fuzzy logic and genetic algorithms to predict car prices as our future work.

We intend to develop a fully automatic, interactive system that contains a repository of used-cars with their prices.

This enables a user to know the price of a similar car using a recommendation engine, which we would work in the future

In next level implement this product in ‘herokuu’ to use it any where through the internet. After implementing ‘herokuu’ this product will be a fully finished product to the user .

--

--

Isaac Raja
Isaac Raja

No responses yet