View on GitHub

DATA150_FALL_2021

Research Proposal: Using Data Science to Assess the Prevalence of Malnutrition in Sub-Saharan Africa

Word Count: 2528

Introduction:

Making humans the center of human development requires that we focus on individual rights because focusing on these underlying principles help meet the end of goal of human development- a satisfactory and enjoyable life. Amartya Sen captures these ideas perfectly in his book discussing development as freedom. In it, he explains that development is the process of expanding real freedoms for people to enjoy. He further explains that when an institutional boundary deprives them of these inherent freedoms, people no longer can make basic yet influential choices that both help them live enjoyable lives and ultimately create a society in where everyone is satisfied. To that end, I believe that a major factor that is necessary for people to attain these inherent freedoms is access to necessities. Without food and water to survive, people can’t strive for that these freedoms, thereby preventing human development. Thus, throughout these assignments, I focused my research on malnutrition in Ethiopia, specifically on how data science methods have been implemented to assess the current malnutrition problem in this impoverished country. While researching, I discovered a major research gap. When I was doing my initial research on malnutrition towards the beginning of the semester, I realized that a lot of research studies on this topic were primarily done in Northern African countries like Algeria or Ethiopia (hence why I chose to do this country), meaning there was a lack of research on countries in Sub-Saharan Africa, a vast area where poverty and most definitely malnutrition are rampant. Because research must be done this area, in this paper, I will propose my methodological approach on assessing the prevalence of malnutrition in Sub-Saharan Africa. Specifically, I will focus my assessment on children under the age of five, as malnutrition has an disproportionate on young children who need proper nutrition in order to physically and cognitively develop in this crucial stage of life. Thus, my new research question is as follows: What is the extent of malnutrition in Sub-Saharan Africa among children under five and what can we do to mitigate the detrimental effects of the issue on a wide scale.

Background:

Before I go further, it is important to understand the current economic state of countries in Sub-Saharan Africa. Sub-Saharan Africa is ravaged with poverty, with millions living with less than $1.90 a day. Not only is the poverty rate in this region over 40% of the world’s 28 poorest countries, 27 of them come from Sub-Saharan Africa. Furthermore, World Bank projections also show that extreme poverty is showing little to no signs of improvement in the coming years as this is due the region’s slow economic growth. In fact, it seems like the opposite is happening. According to the World Bank, people living in poverty grew from 278 million in 1990 to 413 million in 2015, and it seems like this trend has no intention of stopping [2]. Because Sub-Saharan Africa is in impoverished economic state, children have a hard time getting proper nutrition needed for growth. The extent of this reality is for the most part unknown, implying that action must be taken, and it must be taken now.

Research Plan:

To tackle this issue, I propose a three-step plan which is summed up as follows: Measure, analyze, and solve. Given that I am dealing with an area containing over of nearly 1.1 billion inhabitants spread across 9.4 billion square miles, it is important that I divide and conquer when it comes to the measurement aspect [3,5]. Thus, first, I will divide this region into its four commonly known sub regions: West Africa, Central Africa, East Africa, and Southern Africa. I will then send one 100-person survey team out to each of the regions, where each of the four teams will measure the height and weight of around 50,000 children in their designated region. In addition to taking the measurements, the surveyors will also ask the child’s parents to answer a questionnaire that seeks to assess the quality of life in which the child is in. The questionnaire is is split into three sections. The first section asks basic questions about the child, including the child’s sex, age, birthweight, approximate calorie intake/day and overall healthiness. The second section asks questions about the quality of life the household, including whether the family has assess to clean water, toilet facilities, and health services. The third section ask questions concerning the parents, specifically their ages and educational status. This section also includes sensitive questions like the breastfeeding duration of mother and age of mother when child was born. Each time a questionnaire is given out, it is the responsibility the surveyor to list the terms and agreement for data privacy. Included in the terms and agreement is the right of the parents to disclose as much information they are comfortable with. Moving forward, only a portion of the collected data will be used in the analysis as a dataset of 50,000 children will take too long to analyze. Thus, to determine which children are to be chosen, the survey teams will the same hierarchical classification method used in a previously researched study, examining the prevalence of malnutrition in Ethiopian children under five. For context, The Ethiopian study’s survey data consisted of the heights and weights of tens of thousands of children and mothers, but the researchers condensed the data to the 9494 eligible pairs based by implementing the following. They first stratified the data by looking at 11 distribution sites and then condensed it to about 1200 enumerated areas. Following this, they then condensed it to 18,008 households, and finally to the around 11,000 mother-child eligible pairings, all of which whose children seem to be malnourished. After extensive visitation, and interviewing of these pairings, the 9494 were chosen for analysis [1]. Thus, in the context of the proposal, each team will be able to condense 50,000 they collected in their region to around 20,000 that will be used in analysis. The distribution sites for a region will be the top 20 most populated cities or areas within that region. Given the number of workers involved, I estimate that the data collection will take approximately one and a half years, while condensing the data will take half a year.

Having compiled the 80,000 data points that will used for analysis, the next step is to analyze the data. I will recruit a total of 40 data scientists, divide them into groups of 10, and then assign each group to analyze the 20,0000 data points associated with one of the four regions. To analyze the data, they will use logistic regression, a predictive analysis method that explains the relationship between an independent variable (x) and a dependent variable (f(x)) by fitting a logarithmic S-shaped curve to the observed data in question. Specifically, this method is often used to explain the relationship between multiple independent variables, and a binary dependent variable (0 or 1) [4]. Using the height and weight survey data, each set of data scientists will measure the nutritional status of the children within their assigned region based on undernutrition indicators that abide by the WHO growth standards. The measurable indicators used include stunting (child who is too short for his/her age), wasting (child is to light for his/her height), and underweight (child who is low in weight for his/her age), all of which are established based on certain weight and height ratios. The data scientists will take the standard deviations of a child’s height and weight and its corresponding z-scores, compare those values to acceptable standards, and then categorize them into one of the three measurable indicators. It will be possible for some children to be put in multiple categories others to be put in none. In relation to the logistic regression models, to quantify the dependent variable, a child is assigned a “0” if he or she doesn’t meet the standard for a given indicator and “1”. The general rule of thumb is that of a child’s height and weight standard deviation is more than 2 standard deviations off the acceptable standard, then a “1” is assigned. Thus, if a given child’s height and weight meet the standard for “stunting” and not the rest, the corresponding y values are 1,0,0. If another child meets both stunting and underweight standards, the corresponding y values are 1,0,1. If a third child meets none of the standards, the y values are 0,0,0. The Data scientist will then determine the appropriate functions curves for the data in there given region, from which they will determine the prevalence of malnutrition among the three indicators and other trends. Following this, the data scientists will assess the most common factors or predicators associated with each of the three forms of malnutrition. To do this, they will compare a child’s questionnaire data to their malnutrition data, using multilevel statistical method, odd ratios, and confidence intervals to calculate the frequency a certain form of malnutrition exhibits the one of the listed factors in the questionnaire. I believe that an explanation of the exact method used is beyond the scope of this course as it requires a strong understanding of statistical methods. Following this analysis, the data scientists will then report out the top five common predicators associated with each form. I estimate that the analysis process will take a year [1]. In total, the methods described above follow a similar method used in the Ethiopian study which attempted to assess the prevalence of malnutrition among those under five. However, what makes this proposal different is that scale of the research, measurements, and analysis is much greater than those of the Ethiopian study.

With a deep understanding of both the malnutrition problem in Sub-Saharan Africa as well as the factors most associated with the three forms of undernutrition, the next step is to solve. Specifically, we are to lessen the prevalence of malnutrition and address the major predicators discovered in the previous step. To the lessen the prevalence of malnutrition, the obvious answer is to provide more food to those in need. To do this, I will partner with the United States Agency of International development and together we will seek to deliver as much food as we can to those in need. I speculate that the food we deliver will be pre-packaged and contain uncooked ingredients like rice, soy, and potato, all of which are filled with nutrients. I have chosen to partner with US AID because I really enjoy the work they are currently doing in Ethiopia. One of their notable activities is the “Feed the Future Ethiopian Growth” nutrition project which aims improve the nutritional status of all ages, with an emphasize on improving early childhood nutrition during the first two years of a child’s life. Another is US AID’s “Development Food Security Assistance Program” which seeks to improve the nutritional status of pregnant and lactating mothers, keeping the mother healthy and while allowing the young or future child to enter the world better off. Though difficult, implementing similar programs on a large scale in Sub-Saharan Africa will deem to be very beneficial in the long run. On the other hand, the solutions to address the major predicators varies from predicator to predicator. If a predicator is low educational status of parents, the solution is to provide free or low-cost educational opportunities in the form of books, scholarships, and funding to improve the educational system. Obviously, implementing these solutions take longer than one to three years, but rather this section is included inorder to provide the audience with a perspective of what could be done to in the near future to help solve this problem.If a predicator is a lack of assess to clean water, the solution is to provide funding to help create better water filtration systems [6]. In total, these solutions will set up young children in Sub-Saharan Africa with a better future, open them up to better opportunity, and ultimately promote human development.

Hypothesis:

I hypothesize that the most prevalent form of malnutrition will be the underweight indicator because stunting and wasting are both related to undernutrition. A child who is stunted is also likely be underweight because a shorter stature also means a lighter body. Similarly, a child who is wasted is also likely to be underweight because given that height and age are correlated in early childhood, a person who is too low of weight for their height will also likely be too low of weight for their age.

Drawbacks, Obstacles, and Solutions:

No research proposal is perfect, meaning there are bound to be drawbacks and obstacles in the process. In this case, most of the drawbacks come in the initial measurement section. One of the major drawbacks associated with data collection and surveying vulnerable, poor families is the issue of data privacy. There is a chance that many families will not want to participate in the survey or if they do, provide little to information on questionaries, making the both the data collection and data analysis process even harder. Nothing can be truly done to overcome this obstacle as the choice comes down to the people being surveyed. However, for the most part, people are usually willing to participate if they know the information is being used for a good cause. Another barrier is language. Surveying requires communication which means translation is required. However, with so many African dialects, translation becomes difficult, making the process difficult. To overcome this obstacle, it is important that the surveyors are equipped with some sort of translation AI with a variety African dialect in the database so that they are prepared for any situation. Another is the educational barrier. Filling out a questionnaire requires some degree of literacy. However, because most of the people surveyed don’t even know when their next meal will be, the likelihood of educational literacy is quite low. To overcome this obstacle, instead of handing a written survey to the parents to fill out, the surveyors could verbally convey the survey questions using the translation AI.

Budget:

What follows is a broad description of the overall budget. Obviously, this is an estimation meaning numbers might be skewed and components missing.

Works Cited

[1] Kasaye HK, Bobo FT, Yilma MT, Woldie M (2019) Poor nutrition for under-five children from poor households in Ethiopia: Evidence from 2016 Demographic and Health Survey. PLoS ONE 14(12): e0225996. https://doi.org/10.1371/journal.pone.0225996

[2] Patel, N. (2018, November 21). Figure of the Week: Understanding Poverty in Africa. brooking.edu. https://www.brookings.edu/blog/africa-in-focus/2018/11/21/figure- of-the- week-understanding-poverty-in-africa/

[3] O’Neill, A. (2021, July 28). Sub-Saharan Africa: Total Population from 2010-2020. statista.com. https://www.statista.com/statistics/805605/total-population-sub-saharan- africa/

[4] Statistics Solutions. (n.d). What is Logistic Regression?. statisticssolution.com. https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/what- is-logistic-regression/

[5] Stockdale, N. (2017, December 18). Sub-Saharan Africa. https://www.fortbendisd.com/cms/lib/TX01917858/Centricity/Domain/1006/SSA%20Ov erview.pdf

[6] US Aid. (2021, October 21). Nutrition. usaid.gov. https://www.usaid.gov/ethiopia/nutrition