USED CAR PRICE PREDICTION
ABSTRACT
In this product Cars are being sold more than ever. Developing countries adopt the lease culture instead of buying a new car due to affordability. Therefore, the rise of used cars sales is exponentially increasing.
Car sellers sometimes take advantage of this scenario by listing unrealistic prices owing to the demand.
Therefore, arises a need for a model that can assign a price for a vehicle by evaluating its features taking the prices of other cars into consideration.
INTRODUTION
The prices of new cars in the industry is fixed by the manufacturer with some additional costs incurred by the Government in the form of taxes. so customers buying a new car can be assured of the money they invest to be worthy. But due to the increased price of new cars and the incapability of customers to buy new cars due to the lack of funds, used cars sales are on a global increase. Predicting the prices of used cars is an interesting and much-needed problem to be addressed.
Customers can be widely exploited by fixing unrealistic prices for the used cars and many falls into this trap. Therefore, rises an absolute necessity of a used car price prediction system to effectively determine the worthiness of the car using a variety of features.
Requirements of Tools and Libraries in Proposed system:
Notebook Editor (Jupyter or Spyder)
Python Programming
Scikit Learn-Machine Learning-Supervised Learning
Visualization in Tableau
Flask-Web application
DATA COLLECTION
Description of the Data
The data used in this project was downloaded from Kaggle. It was uploaded on Kaggle by Austin Reese who Kaggle.com user. Austin Reese scraped this data from craigslist with non-profit purpose. It contains most all relevant information that provides on car sales including columns like price, condition, manufacturer, and 10 other categories.
Used Car Dataset has 10 attributes, they are
1. Name
2. Year
3. Transmission
4. Owner Type
5. Present price
6. Selling price
7. Fuel type
8. Seller type
Name :
Brand of the car
Year :
Registration Year
Transmission :The car is whether Automatic or Manual
Owner type :
Single owner or Second hand
Mileage :
How many kmpl the car gives
Fuel :
Type of the fuel
Seller type :
The seller is individual or not
Price :
The Price of the Car
Source and Methods of Collecting Data
In predict USED CAR PRICE the dataset is collected from kaggle .
Data Source — https://www.kaggle.com/avikasliwal/used-cars-price-prediction
Preprocessing and Feature Selection Steps
Handling Null values:
This Proposed system has some null values but here use the filling method because if suppose drop null values that are not an efficient way of future prediction so that here proposed system using fill method with the help of pandas.
Handling Categorical data:
In this Proposed system has commodity and date column had object type but proposed system convert date column with help of date time function another one important feature is commodity that columns has three columns
Once the null values are filled values then apply (pd.get_dummies) to split the columns into category wise with 0s and 1s for prediction .
And the dataset is shaped for prediction, the values of each columns is checked.
df6 = pd.get_dummies(df5,columns = [‘Fuel_Type’,’Transmission’,’Owner_Type’])
Splitting :
The Dataset is split as X and Y.
X is train data and Y is test data.
The RANDOM FORREST REGRESSOR and LINEAR REGRESION MODEL algorithms are used for better accuracy
In future using different algorithms grab the better accuracy.
CODE:
x=df.drop(‘Selling_Price’,axis=1)
y=df[‘Selling_Price’]
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=9)
Model Architecture
RANDOM FORREST REGRESSOR:
A random forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.
An End easily use my set up and Reproduce it Because before modeling all the NULL values are removed with proper and TRANSPARENT code
In This Propose the BARPLOT charts are used to categorize the dataset for the naive user.
LINEAR REGRESSOR:
Linear Regression is a machine learning algorithm based on supervised learning. It performs a regression task.
Regression models target prediction value based on independent variables. It is mostly used for finding out the relationship between variables and forecasting.
Ridge
A Ridge regressor is basically a regularized version of Linear Regressor. The original cost function of linear regressor we add a regularized term which forces the learning algorithm to fit the data and helps to keep the weights lower as possible.
Lasso
Lasso Regression is similar to Ridge regression except here we add Mean Absolute value of coefficients in place of mean square value. Unlike Ridge Regression, Lasso regression can completely eliminate the variable by reducing its coefficient value to 0.
Training Overview
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.15, random_state=9)
MODEL SYNTAX:
from sklearn.ensemble import RandomForestRegressor
The Random forrest regressor how fitting the x_train and y_train and prediction in the model.The below picture shown the (MAE,MSE,RMSE) error accuracy using Random forrest regressor.
from sklearn.linear_model import LinearRegression
lr=LinearRegression()
lr.fit(x_train,y_train)
ACCURACY :
The below result is predict by this propose similarly using different models having a better error accuracy value.
LINEAR REGRESSOR LASSO
ACCURACY
MAE (mean absolute error)
1.334%
MSE (mean squared error)
4.4731%
RMSE (Root mean square error)
2.114%
RANDOM FORREST REGRESSOR
ACCURACY
MAE (mean absolute error)
0.5378%
MSE (mean squared error)
0.8004%
RMSE (Root mean square error)
0.8946%
LINEAR REGRESSOR RIDGE
ACCURACY
MAE (mean absolute error)
1.133%
MSE (mean squared error)
3.2738%
RMSE (Root mean square error)
1.809%
Deployment Process
When the product is ready to run local server then obviously go to deploy the project because after deploy the product user can easily used.
In this proposed system done by local server then upload the code and files in github when github is done .
Local Server running:
In local server running we can run localhost Running on (http://127.0.0.1:5000/)
CONCLUSION
Summary
This paper evaluates used-car price prediction using Kaggle dataset which gives an accuracy of 20% for test data and 80% for train-data.
The most relevant features used for this prediction are price, kilometer, and Transmission Type by filtering out outliers and irrelevant features of the dataset.
Being a sophisticated model, Random Forest gives good accuracy in comparison to prior work using these datasets.
Limitations and Future Work
Keeping the current model as a baseline, we intend to use some advanced techniques like fuzzy logic and genetic algorithms to predict car prices as our future work.
We intend to develop a fully automatic, interactive system that contains a repository of used-cars with their prices.
This enables a user to know the price of a similar car using a recommendation engine, which we would work in the future
In next level implement this product in ‘herokuu’ to use it any where through the internet. After implementing ‘herokuu’ this product will be a fully finished product to the user .