# Introduction

This document provides an overview and some exploratory plots of the Brazil Live Births “Sistema de Informações sobre Nascidos Vivos” (SINASC) data and the Brazil Census data, both available from Datasus. According to the website, SINASC “was officially deployed in 1990 with the objective of collecting data on births throughout the country and providing birth data for all levels of the Health System.”

The SINASC data is publicly available to download from the datasus ftp site. For the results shown in this document, data was downloaded for births from 2001 to 2015, resulting in 44.5 million birth records. The final section of this document will go into more detail about how the data was processed and the variables that are available in the data.

Given the quality, completeness, and size of this data, there are many research questions that could potentially be studied using it. Ultimately, we will be trying to understand the risk factors for low birth weight, but document mostly contains some introductory exploratory visualizations to help illustrate what is in the data.

# SINASC Exploration

In this document, we will look at some visualizations of the variables in the data ranging from summaries at the population level, the state level, and the municipality level.

## Population Summaries

### Yearly Births

The annual number of births is around 3 million, having dipped down to under 2.9 million and rebounding in 2014.

### Daily Births

This plot shows the number of daily births broken down by birth month and colored according to whether the birth was on a weekend or weekday. Breaking things up in this way helps highlight some interesting phenomena.

First, we can see that there is a clear annual cycle for births, with the most popular months being March, April, and May. Also, we see that there are significantly more births on weekdays vs. weekends.

How can the number of births vary so much over different temporal breakdowns? The weekday/weekend phenomenon is easier to try to explain, and this has to do with the fact that a large proportion of births in Brazil are scheduled cesareans (as we’ll see below), and doctors will most likely schedule these for weekdays rather than weekends.

### Yearly Births by Delivery Type

The only two categories for delivery type are vaginal and cesarean. As can be seen, cesarean births have become dramatically more prevalent than vaginal births, with the inflection point in 2009.

Another way to look at this is as the two delivery types being presented as percentages of births over time.

### Birth Weight vs. Year

This plot and all of the following summary plots examine the relationship between birth weight and other variables in the data. The dots represent the median birth weight value and the lines extend from the dots to the first and third quartiles, such that the span of the line represents the middle 50% of observations in the data to give an indication of variability.

This plot shows, as aggregated across the entire country, that the distribution of birth weight has not changed over time.

### Birth Weight vs. Gestational Age at Birth

As we would expect to see, pre-term birth is strongly associated with birth weight. The horizontal dashed line at 2500 grams indicates the threshold for low birth weight.

### Birth Weight vs. Pregnancy Type

Carrying more than one child has an obvious association with birth weight.

### Birth Weight vs. Mother’s Age

There is an interesting association between the mother’s age and birth weight. This plot is also interesting in that it shows that the span of a childbearing mother’s age is quite large.

### Birth Weight vs. Number of Prenatal Visits

Although not extremely strong, there is an indication that there is some association between higher birth weights and number of prenatal visits.

## By State

The previous plots looked at the data as a whole population. It is interesting to look on a geographical level to see how the variables vary across geography and time. In this section we will look at the geographical granularity of the 27 states of Brazil.

Most of the plots in this section show the states represented as a regular grid, with the grid laid out in a way as to mirror the geography as closely as possible.

### Percent Change in Births Over Time

This plot shows the percentage change in number of births over time and by state, using the number of births in 2001 as the baseline for the percent change calculation. For example, in Alagoas, there were 20% fewer births in 2015 as in 2001, whereas in Espírito Santo, the number of births in 2015 has not changed from that in 2001, although the percentage dipped in the middle.

Each panel of the display is colored according to the 2015 percent change value to help draw attention to states that have had a net increase or decrease in number of births. This plot does not take population into account.

### Percentage of Children Born With Low Birth Weight

This plot shows the yearly percentage of children born with low birth weight for each state. While we saw earlier that the overall distribution of birth weight for the entire population does not change over time, when looking at percentage of low birth weight children over time by state, We see that this statistic is not necessarily constant over time.

The overall range of percentage of low birth weight children spans from 5 to 10%. The panels of the display are colored based on the 2015 low birth weight percentage to help draw attention regions where this percentage is higher or lower than others. In general the states in the southern half of the country have a higher prevalence of low birth weight than those in the northern half.

Some states are showing an increasing rate of low birth weight births. For example, Ceará saw the largest 2001-2015 increase of 1.6%.

### Distributon of Delivery Type

This plot shows the breakdown of delivery type as a percentage of births each year by state. From this plot is is clear that cesarean birth is increasing in all states, but it is more prevalent in some states than others. For example, many states in the northwest still have vaginal births as the majority.

### Distribution of Mother’s Education

This plot shows the education of mothers giving birth has been improving over time across all states.

### Distributon of Race

This plot shows the distribution of race of the children born over time and by state. From this we see that across nearly all states, the multiracial share of the population is growing relative to the other groups. We also see that the southern states have children being born that are predominantly white, and that a only few states (in the northwest) have a significant population of indigenous children being born.

### Distribution of Marital Status

From this we can observe that single mothers giving birth is more common than other marital status and is on the rise across all states. In a couple of states like Minas Gerais and Espírito Santo, the comparison of single / married proportion is nearly equal, while in most other states, single mothers are more prevalent.

### Distribution of Prenatal Visits

This plot shows mothers in the southern states are typically having more prenatal visits than those in the north.

### Distribution of Binned Mother’s Age

Here we have binned the mother’s age into teens, twenties, thirties, and forties+. We see the general trend of mothers having babies at a later age over time, with some states such as São Paulo and the Federal District having nearly half of births coming from women aged 30 or older in 2015.

### Median Mother’s Age

This shows the median yearly mother’s age for each state. Clearly mother’s age has been increasing over time everywhere, but many of the northern states have a lower median mother’s age.

## By Municipality

There are over 5500 municipalities, so there will be many possible way to look at the data at the municipality level.

### Mean Birth Weight by Municipality

To start, below is a plot of of the mean birth weight for each municipality plotted by rank within each region of Brazil.

For reference, below is a map with these regions indicated in the same colors.

### Choropleth of Mean Birth Weight

Below is a screenshot of an interactive choropleth map that shows the average birth weight for each municipality.

The interactive map is large and takes a while to load, which is the reason for showing a screenshot here. To interact with the map, follow this link. There, you will be able to pan and zoom and hover to get more information about each municipality.

More to come soon.

# Census Exploration

There are several census datasets (see census data preparation section below), but we will focus on the “RENDABR” dataset, which provides data for the average household income per capita at the municipality level.

Variables in this dataset include:

• muni_code: Municipality code
• year: Year of the data
• race: color / race
• house_inc: sum of the average household income (numerator)
• pop: population considered (denominator)
• n_child: number of children considered
• pop_2mw: population with average household income per capita less than 1/2 minimum wage
• pop_4mw: population with average household income per capita less than 1/4 minimum wage
• n_child_2mw: children in a situation of average household income per capita less than 1/2 minimum wage
• n_child_4mw: children in a situation of average household income per capita less than 1/4 minimum wage
• pop_16unemp: resident economically active population aged 16 and over who are unemployed
• pop_16: resident economically active population aged 16 and over
• pop_10work: resident population with 10 to 15 years of age who is working or looking for work
• pop_10: total resident population with 10 to 15 years of age

The census data is recorded for 1991, 2000, and 2010. For each municipality and race, numbers are provided that can use to construct statistics (means and proportions) at different levels of aggregation.

## Population Summaries

### 2010 distribution of average monthly household income by race

There are many other variables and other census datasets to add visualizations for here as well.

## By State

### Approximate weight z-scores

Another way to look at this is whether the difference in weight is

[1] "Less than 22 weeks" "22-27 weeks"        "28-31 weeks"
[4] "32-36 weeks"        "37-41 weeks"        "42 weeks and more" 

This plot raises a different way of thinking about what might explain the difference between low and high income. This suggests that low income municipalities on average have pre-term babies that have a much higher birth weight than expected against the WHO standard (z-scores greater than 3) while higher income municipalities have babies with a more “expected” birth weight for pre term birth. This warrants some consideration as to why lower income babies would weigh more than they should at birth.

Note that there are some caveats associated with this plot. First, since we don’t know exact gestational age, we are using the midpoint of each gestational age range as the value against which we compute the z-score. Second, we are computing z-scores of the means, which have washed out the sex of the child, and the z-score standards are different for each case. However, the calculation should still be a useful approximation.

# SINASC Data Prep

## Access

The SINASC data is available publicly for download at the datasus ftp site.

There is a file for each state and year, and the data currently goes up to 2015.

## Data Dictionary

The file located here contains a dictionary for many of the variables in the data, although it does not cover any of the many variables introduced in 2010. This file was used to construct a data dictionary data structure in R that was used to preprocess the data.

### Variables in the Dictionary

The following variables with corresponding English-translated names and labels are available in the dictionary:

name name_en label_en
NUMERODN dn_number DN number sequential by UF and year
LOCNASC birth_place Place of birth
CODESTAB health_estbl_code Health establishment code
CODBAINASC birth_nbhd_code Birth neighborhood code
CODMUNNASC birth_muni_code Birth municipal code
IDADEMAE m_age_yrs Age of the mother in years
ESTCIVMAE marital_status Mother’s marital status
ESCMAE m_educ Mother’s education
CODOCUPMAE occ_code Mother’s occupation, according to the Brazilian Occupations (CBO-2002)
QTDFILVIVO n_live_child Number of living children
QTDFILMORT n_dead_child Number of deceased children
CODBAIRES res_nbhd_code Residence neighborhood code
CODMUNRES m_muni_code Residence municipal code of the mother
GESTACAO gest_weeks Weeks of gestation
GRAVIDEZ preg_type Type of pregnancy
PARTO deliv_type Type of delivery
CONSULTAS n_prenatal_visit Number of prenatal visits
DTNASC birth_date Date of birth in ddmmyyyy format
HORANASC birth_time Time of birth
SEXO sex Sex
APGAR1 apgar1 Apgar in the first minute (00 to 10)
APGAR5 apgar5 Apgar in the fifth minute (00 to 10)
RACACOR race Race / Color
PESO brthwt_g Birth weight, in grams
IDANOMAL cong_anom Congenital anomaly
CODANOMAL cong_icd10 Code of congenital malformation or anomaly chromosome, according to ICD-10
DTCADASTRO sys_reg_date Date of registration in the system
DTRECEBIM rec_reg_date Receipt date at central level, last date registry update
CODINST reg_gen_code Registration generation installation code
UFINFORM rep_uf_code UF code that reported the record

### All Variables

To illustrate the available variables and how they have evolved over time, the plot below shows the variable name on the y-axis and the year on the x-axis. If a dot is plotted for a given variable and year, it means that data is available for that variable in that year. Variable names that are lowercase indicate variables that are present in the data dictionary, while uppercase variable names indicate variables that are not in the data dictionary.

As can be seen, the data dictionary covers the variables that are common across all years, and the dataset I have constructed contains this subset of variables to keep a common data structure across all years.

Given the many more variables that are available from 2010 and beyond, it may be worth considering whether to do a separate analysis with a subset of the data starting at 2010 so that we can make use of these variables.

## Preprocessing

• After downloading the files from the ftp site, I read them into R using the read.dbc package. The format of the files is “.DBC”, which is a compressed DBF file that this package can read in.
• For each file, I select the 23 variables that all data files have in common.
• Variables are renamed to be English-readable using the data dictionary.
• Variables are cast to the appropriate type (such as dates, factors, etc.).
• Variables that are factors are recast from numbers (e.g. 1, 2) to more meaningful factor levels (e.g. “Male”, “Female”) according to the data dictionary.
• A few records with strange encodings are fixed.
• The variable birth_year is added since it is frequently used.
• The 7th character of birth_muni_code and m_muni_code is dropped to be consistent across all years (2001-2005 have 7 characters while 2006-2015 have 6 characters). The 7th character is extra and not needed.
• Since only municipal codes are provided, two new variables, birth_state_code and m_state_code are created by merging a municipality / state code lookup table.
• Municipality codes are converted to integers to save storage space.
• A few implausible values are set to NA: m_age_yrs of 0 or 99, brthwt_g of 0 or 9999, apgar1 and apgar5 of 99.