workshop1

Random Forest Reading Response

Sydney Taylor

3.25.20

Questions

In the reading today (Stevens et al.) the authors use a technique to produce a high resolution description of the distribution of human populations across the globe. What is the name of the technique and describe in general and basic terms how it works?

The authors created a new technique that is based on a “random forest.” This “nonparametric predictive model” collects large scales of data sets to understand different parts of the world. These random forests are able to collect accurate information that many other methods simply cannot. In order to function properly, the random forest method uses remotely sensed and geospatial data from multiple scales into a dasymetric model. This model provides detailed information of various statistics collected from other methods as well. The authors decided to use case studies in three different countries: Kenya, Vietnam and Cambodia.

The random forest method used by the authors is a machine learning algorithm (ensemble method). In general terms, what is a machine learning algorithm? Within context of this study what distinguishes a data science, machine learning method (such as a random forest) from previous classical statistical approaches to describing and analyzing phenomenon and events?

A machine learning algorithm is a mathmatical equation that is developed over time. The more the algorithm is used in various data sets, the “stronger” the method becomes. This not only improves the data collected but has the power to impact future methods. In random forest, its machine learning algorithm is used to predict per-pixel population density. Processing this information will improve the data collection pertaining to population increases throghout the world. Eventually, the data is weighted by predictions to verify its accuracy. This approach is vastly different from other methods that have been used in the past.

In the reading, the authors use a number of geospatial covariates and approximately how big of a data set did they represent (in general term)? What is the significance of big data in the estimation of machine learning methods for inferring the correltes and drivers of human population distributions?

To obtain and create new datasets, geospatial covariates provide vital information about population density. This particular case uses numerous covariates to understand its big data sets. For example, the use of a “buffered polygon minimizes the effects associated with near-border areas.” This ensures more accurate data for better understanding of human development and population densities. Big data with machine learning methods explains the complications of data changing every minute. With constantly new data needing to be collected and analyed, it is important to ensure proper collection methods and uses of accurate information for the public. Especially with population data, numerical research has the power create panic among millions world-wide.

The authors’ results present a remarkable improvement over previous geospatial descriptions at very high resolution, of the distribution of the human population. Within the context of human development in LMICs, what is the significance of having a highly accurate description of where each person is located across planet earth?

Data accuracy is extremely important for human development. The random forest methods use new forms of data collection that will provide accurate information for the public. As the human population increases, it is important to have accurate density rates. Policy makers, companies and industries need to understand the amount of resources maintain the growth and development of mankind. This random forest method can impact the lives of millions with the true analysis of population density.

Within the context of human development in LMICs, what is the relevance to your area of investigation in having a highly accurate description of where each household and person is located across planet earth?

For my research, I decided to focus on West African countries. This region has the lowest education rates in the world, due to poor literacy rates, teacher availability, studend attendance and many other factors. The understanding of population density can greatly impact my research. As millions continue to populate the earth, education will not be equally accessibile to all. It is important to know which regions have higher density than others, which may have lower/higher education rates than others.