by Shreyas Kar
Water is the most precious liquid to the humans. In recent years, the conservation of clean water has become a major problem in society with estimates predicting a major shortage of water in even developed countries by the mid 21st century.
To combat this problem, a machine learning-based regression model has been created and trained with the most optimal features to predict water usage in a household for a particular weather condition. IoT enabled water flow sensors and automated valves are proposed to be installed at several junctions in a city water distribution system. If the usage of a particular junction is a threshold percentage above the values predicted by the machine learning model for that weather condition an alert goes to the concerned users as well as to the water distribution authority. In extreme cases, where the usage is deemed to be way higher than predicted usage, indicating breakage, the valves get automatically shut or reduced water flow.
I got historic weather and water consumption data from a relatively large city in the United States, that is Austin, Texas. The Weather Data from the National Oceanic and Atmospheric Administration (NOAA) consisted of 13,040 examples of numerous features, such as average, high and low temperatures for each date; the average, high and low humidity values for each date; sea level pressure, visibility and precipitation. The 13,040 examples were recorded in the city of Austin daily for a span of 4 years. The Water Consumption Data: This consisted of monthly recordings of the number of gallons consumed in various types of homes. The homes were subcategorized into 4 categories: Multi-family irrigation, Multi-family residential, single family residential and single-family irrigation.
First the data is normalized with z-score normalization, the categorical variable is converted into a numerical variable. One hot encoding is used to create categories, one for each variable, where 1 is placed if a particular example belongs to that category and a 0 is placed if it does not.
In order to select the appropriate features, a heat map detailing the Pearson correlation (r) of every pair of variables was built. In particular, the correlation value of each of the features with gallons was observed. After a review of the existing literature and the heat map, 3 features from the weather data set – average temperature, average humidity and precipitation levels – and one feature from the water consumption data set the – type of house (a categorical variable) – were chosen.
After careful research to choose an appropriate machine learning model for this prediction two ML models were tried - namely recurrent neural network and polynomial regression as the prediction outputs several predicted values instead of classification. Finally polynomial regression was chosen as the appropriate model because of the better accuracy found using that model. The model is trained 100 epochs (iterations) were used and gradient descent was run 50 times. The accuracy found on the test data was .92 making the ML model a great predictor of water usage.
A water flow sensor (YF-S201) is connected to an Arduino controller (Arduino Uno) and a program is written to measure the water flow at the node. The data is then sent and stored into Google Firestore cloud-based database using the Johny-Five IoT based framework. This data is then compared with the machine learning model provided prediction and alert is generated once the threshold is reached. There are two threshold values – one for alerting the users if say 30% above usage is clocked and another one if Very High Usage Threshold, that is the threshold percentage above which an action is also performed automatically, such as closing the valve at the node. A critical alert is sent to the users and supplier for possible pipe burst or breakage.
The proposed system is to be implemented as follows: The IoT based sensor will collect real-time water consumption data at a particular valve serving one or a cluster of households. Next, this data will be fed into Google cloud firestore database. The consumption is compared against the predicted water consumption by the machine learning model for the current weather condition. If the consumption is beyond the alert threshold, alerts are sent to the users so that they are aware of additional usage and take informed action to reduce water usage or repair leaks if required. If the water consumption is abnormally high, that is more than the very high usage threshold, then the IoT enabled valves are closed remotely so that huge amount of water losses can be prevented. This is meticulously designed so that it affects as few homes as possible. This state-of-the-art system has potential to modernize the water delivery system to conserve precious clean water.
Technologies Used:
Technologies used in this project are Programming Language Python and SQL, Johny-Five IoT framework written in JavaScript, Google Cloud Firestore Database, Arduino controller, IoT enabled Water flow sensors.