How New York City Prioritizes Solving its Water Problems.

The Github Repository for all the below and adata analysis is in HERE

The Explainer Notebook for all the below analysis is in HERE

We analyzed more than 24 million lines of data on water-related complaints in New York City over 5 years to understand how the city prioritizes solving its problems, and whether it has improved. Our project seeks to understand public policy patterns and share findings related to these problems.

Introduction

New York City has more than 8 million residents, making it one of the largest cities in the United States, managing a city of this magnitude becomes a big challenge, especially in solving the problems of all its residents. To try to systematize these requests, the US government has set up a hotline called 311 where each resident can report non-emergency problems to the institutions in their city.

Luckily for us, the city of New York shares all non-emergency requests. The dataset is really massive, from 2010 to 2021 there were more than 24 million requests between different institutions, the types of requests are extensive and we decided to filter all these datapoints on water related problems, be it difficulties in distribution, water quality, water waste, etc… The reason we have chosen the topic of water is that distribution and quality of this type of basic service is a real challenge for the City of New York, according to them, the disruption of water distribution in the current networks, sediments, odor, taste and color problems are serious challenges to be faced in order to achieve quality goals of the Environmental Protection Agency (EPA)

Additionally, our focus is to understand how an institution acts when it comes to solving these problems and what are the influencing factors. It is not news that New York City has one of the most expensive costs of living in the world, so our main question is, does New York City prioritize the solution of its water problems to the most expensive neighborhoods?

To get a “proxy” in understanding what classifies as a valued neighborhood, we researched real estate sales data made available by the New York City Department of Finance, we can evaluate over 10 years the median price of real estate in different zip codes within the city and cross-reference it with the claims made by the 311 report.

And last but not least, to prove the number of complaints, the valuation of real estate in different neighborhoods, what makes more sense is to look for data collection on water quality in New York City so we can understand this dynamics over the years.

Understading The City of New York by numbers

As has already been said, living in New York City is expensive, and it has become even more so over the years. The big apple is famous for its skyscrapers, the rush of everyone, and the number of things to do in a single day. However, we must show through the data how expensive the city became. During the last 5 years, we can notice a big rise in the average price of real estate, even more during the beginning of the pandemic, another interesting fact is that we cross-checked the volume of complaints from residents and in the same year (2020) the claims dropped dramatically, a hypothesis of this behaviour may have been the paralysis of modern society with the declaration of a pandemic that made people seek remote channels of work and communication, so many left the city, creating an exodus in the modern era.

Street Art Image

Well, we understand that living in the city is super expensive, however, do these prices reflect the city as a whole? In the graph below you can see that the cost of housing is concentrated on Manhattan and that the other neighbourhoods are leveled off at a much lower level than the areas on the island.

Street Art Image

Street Art Image

Now, we need to see if real estate prices reflect a difference in the number of complaints by its residents, and how that variation changes over the years because maybe this reflects how the government is acting to solve these problems and if they are focusing on any one region. So, we came up with the main question of this article, does the price of real estate affect the number of complaints reported in New York City? In the graph below we can see the distribution of complaints and average prices for 5 years and we check some very interesting things!

Street Art Image

Notice that the High Bridge and Morrisiana, Bronx Park and Fordham, and Central Bronx had a totally inverse relationship when it comes to property prices during 2020 and 2021, all these areas suffered a devaluation and at the same time the number of complaints remained the same or increased during the same period, it seems that the two variables have a certain relationship, and now we have gained the opportunity to investigate further the complaints related to water in the city and how this affects certain areas!

Investigatin Water Complaints

The reason we have chosen water-related problems is that New York City has the largest unfiltered water system in the world, even in a region with 8 million inhabitants it is almost frightening to think of the risk it can cause to people’s health and the environment. Therefore, we decided to investigate these complaints and focus on which regions are affected in order to share our findings.

To understand a little better the trends in the types of water complaints in the city, we built a ratio system using the probability of a complaint occurring in different regions, the details of how we set this up are available in our repository repository. Basically, if some neighbourhood has a complaint above 1 it shows that it has more requests than the city and the same dynamics are applied inversely, this visualization is really good to understand the bottom necks of each region in the city and consequently understand the challenges they have.

Street Art Image

Of all the complaints, we will focus on those that involve drinking water because they are more relevant to the study of this article. Of the four regions with the most complaints about drinkable water, 3 are located in the borough of Bronx, with a little research, we realize that this issue is not new to the region and that even legal actions are being taken to better treat drinkable water, especially in the removal of chemicals and other contaminants. Even if for now we are focusing on the health of the residents of the region, the presence of substances in the water affects the entire environment. It is a question of the order of consequences, if humans suffer from it at first it is logical to think that fauna and flora can also suffer. Identifying the location and frequency of these problems is essential for a public decision maker and we can share this finding with them.

It is clear that the Bronx region has not been very positive in terms of water quality. Luckily we were able to relate the complaints data to the water quality levels of the collections from around the city, also, with a little bit of stubbornness, and using technology, it was possible to find the collection locations and compare different regions and their levels of certain chemicals or contaminants.

Street Art Image

The database contains information on two key substances to define its quality, the first is turbidity, which is a measure of water transparency. The levels recommended by the New York government are not to exceed 5 Ntu (Nephelometric Turbidity unit) per 100 ml of water. The second measure, one of the most important for the health of living beings, is the level of coliforms in the water, the level of bacteria in 100 ml of water should not exceed 0.10. Most types of coliform are harmless, however some types can cause severe health problems, such as fever, diarrhea and respiratory problems.

We were able to compare the number of complaints and the count of incidents per year the number of these cases with coliform above the recommended levels. To stay on the subject of this article, we compare it to a nearby borough, Manhattan, which is curiously the most expensive part of the city and that share the same water system.

Street Art Image

We can see a clear difference in the water quality between boroughs, notice that, Bronx in the 6 years had a much higher number of incidents than Manhattan and that the complaints are directly connected with the levels collected on the same dates, incredible isn’t it? It is possible to see clearly that Manhattan no longer has so many incidents over the years, moreover it seems that their coliform levels are relatively under control, a hypothesis of this is that Manhathann is a more valued area of the city and for this reason received a higher priority from the city, even though the coliform levels were already high in the Bronx a year before.

To make this comparison more interactive we created a tool where you can see the levels of Chllorine, Turbidity, Fluidity and Coloform in all neighborhoods, with this visualization it is possible to understand where are the neighborhoods with the worst averages.

Static HTML file

Predictingredicting Complaints in New York City

As a measure to help the New York City government better understand how a claim is created, we developed a machine learning algorithm that determines which variables are most influential in creating a claim. From what the data shows, the government does not have an ideal control of the prioritization of requests and by sharing this model we hope it will be possible to review their processes.

Street Art Image

We ended up using the random forest random forest model where from our model it seems that complaints about water system, plumbing and sales volumes are the big influencers in generating new repair orders.

Since this is the union of a lot of data, we put in as many features as possible to try to provide good prediction accuracy.

The result of our model had an R² of 60.64% in predicting claims on the test database.

Because of the massive amount of data we have and the fact that we are using random forests, the processing requires a lot of computational power, we decided to choose a decision depth of only 40 layers giving a score of 60,5%. In the image below you can see the criteria for choosing each sheet and how they affect our model.

In conclusion we like the model established and it shows well the influence of prices on the decision to create a claim, it seems that the more expensive a property is the more likely it is to report something. Maybe it would be interesting for NYC City Hall to analyze the properties that have a low market volume, because we understand that the water quality problems are in these areas and not in the more expensive ones.

Conclusion

It is possible to see that the city of New York is massive, and obviously has many problems, and the findings of this study can be very important for decision making. The function of reporting these findings is relevant to the current situation where it is more and more attentive to the goals set by the UN for a more sustainable world, where NYC is key in solving this global problem.

We can see that one of the most valued regions of the city, has a better management of the solutions to the problems related to drinking water than its neighbors, this kind of behavior occasioned by the government causes inequality and an indirect process of gentrification.

As future actions to further develop the methodology, we understand that it is of utmost importance to have more detailed data about the resolution of the requests made by line 311, analysis of other environmental factors (flooding, CO2, waste, etc …) to have a complete picture of the behavior of the city in relation to sustainability and human health. However, we believe that the findings of this article are of great value because they prove the issues already discussed.

Datasets used:

  1. 311 Service Requests 311 is a phone number used in the U.S. that allows callers to access non-emergency municipal services, report problems to government agencies, and request information.

  2. Drinking Water Quality Distribution Monitoring Data Data collected to fulfill the requirements of the SWTR (Surface Water Treatment Rule) and FAD (Filtration Avoidance Determination). Data is collected via grab sampling, analysis, LIMS data capture and reporting. Each record represents either a four hour turbidity result, a 24 hour average turbidty result, or a daily fecal coliform result from DEL18DT (Delaware Shaft 18 downtake).

  3. Rolling Sales Annualized Sales files display yearly sales information of properties sold in New York City. These files also have information such as neighborhood, building type, square footage and other data.

  4. NYC ZIP Code Tabulation This data set was used to fetch the polygons from each ZIP code in order to visualize the house prices in a spatial form together with the water quality indices in the city regions.