Let’s pretend it’s 2019… and you still can make travel plans
Regression modeling for AirBnB pricing
Intro
Hello there.
How are you? Do you happen to be looking for an AirBnB to rent in NYC… in 2019, before the pandemic hit?
Great! You’re in luck because I have the app for you. Don’t worry, I’m not going to try to sell anything in this article. All you need to do is hop in your time machine to the days before the pandemic, or the “plandemic”… just kidding. This is not going to be a silly conspiracy rant!
This article is actually a showcase of a pretty neat app that I had the privilege to help develop.
Sourcing the data

Why did we build an app with 2019 data? Let me explain. For the sake of getting started ASAP we found the most robust dataset as quickly as we could and found it on kaggle.com. Check it out here on the kaggle website. That dataset contained 2019 NYC AirBnB rental data. So we chose it and used it to train a regression model that provides suggestions on the optimal price for an AirBnB rental in NYC.
The dataset was certainly worthy of the task for which we needed it and lengthy enough. It had 48,895 rows and 14 columns. Some members of the team would have preferred a larger dataset but time was of the essence. So we used what we found and it was more than enough to train a robust model.
For whom did we develop this app?
Now, we actually developed this app as a tool for property owners to use to determine a competitive price at which to list their property on AirBnB for rent. However, I was feeling nostalgic of the times before travel was impeded by a global pandemic. From that nostalgia, I realized this app could also work for someone trying to determine what would be considered a fair price to pay.
…Oh, to be a tourist again.
The code behind the app
To develop this app we took a “divide and conquer” approach and applied an “agile-esque” workflow. We divided up the front-end, back-end and data science tasks amongst ourselves. This was great. I was very grateful to be developing this app with such a large team. Doing this alone would have been a pain in the butt, but since we were such a large team we were able to break down the project into such small tasks that the work almost felt trivial.
For example, I was charged with the tasks of making the data visualization and writing the unit tests for the app functions that were used in connecting to the API and testing that the model returned the correct json output. I’ll go into more detail on the data visualization below, but you can checkout the repository containing my unit tests and the rest of the code for the app here.
Like I mentioned we took a “divide and conquer” approach to the whole project so another member took on the task of training the model. He did a good job. First, he started with a linear regression model and then compared the performance of various types of models such as RandomForrest and GradientBoost to determine which would provide the best results. All of this was done using sklearn. The code for the modeling can be found here.

The developers working on the frontend did a great job on the look and feel of the app. It is beautifully humble in appearance yet the predictability is quite powerful. To run the app yourself check it out here at https://airbnb-sigma.vercel.app/
It’s still a prototype and takes about 40 seconds to return a price when making the first submission because it’s training the model anew when submitting for the first time. Subsequent submissions return a price almost instantaneously. Not ideal, I know, but keep in mind this is still a prototype.

Code for the map visual
Unfortunately the dataset was too large to upload here. If you’d like to play with the map to zoom in and out and see all the data for each plotted data point then check out the colab notebook in which I made it at the link here. Anyway, below are some screenshots of the data visualization.




If you’d like to recreate the map for yourself just download the dataset and plot the data points using plotly. Below is the code I wrote to create the map with plotly. You’ll have to do some cleaning of the data, but a trivial task for a data scientist. Have fun and thanks for reading!
import plotly.express as pximport pandas as pddef visual(df):"""this takes the cleaned NYC rental dataset in csv format and makes interactive heatmap### ResponseJSON string to render with react-plotly.js"""fig = px.scatter_mapbox(df, lat="latitude", lon="longitude", color="price",hover_data=["neighbourhood_group", "neighbourhood", "id"],color_discrete_sequence=["fuchsia"],zoom=10, height=125)fig.update_layout(width=1000,height=1000,