Community Development Publication

Small Business Credit Survey

2019 Report on Employer Firms in Texas

Appendix: Methodology

Data for this report are derived from the national Small Business Credit Survey (SBCS) effort. Weights and imputations are used to create a nationally representative sample. The following methodology write-up is from the 2017 Report on Employer Firms, available at fedsmallbusiness.org.

Data Collection

The Small Business Credit Survey (SBCS) uses a convenience sample of establishments. Businesses are contacted by email through a diverse set of organizations that serve the small business community. Prior SBCS participants and small businesses on publicly available email lists are also contacted directly by the Federal Reserve Banks. The survey instrument is an online questionnaire that typically takes 6 to 12 minutes to complete, depending upon the intensity of a firm’s search for financing. The questionnaire uses question branching and flows based upon responses to survey questions. For example, financing applicants receive a different line of questioning than nonapplicants. Therefore, the number of observations for each question varies by how many firms receive and complete a particular question.

Weighting

A sample for the SBCS is not selected randomly; thus, the SBCS may be subject to biases not present with surveys that do select firms randomly. For example, there are likely small employer firms not on our contact lists and this may lead to a noncoverage bias. To control for potential biases, the sample data are weighted so the weighted distribution of firms in the SBCS matches the distribution of the small (1 to 499 employees) firm population in the United States and Texas by number of employees, age, industry, geographic location (census division and urban or rural location), gender of owner(s), and race or ethnicity of owner(s). We first limit the sample in each year to only employer firms. We then post-stratify respondents by their firm characteristics. Using a statistical technique known as “raking,” we compare the share of businesses in each category of each stratum (e.g., within the industry stratum, the share of firms in the sample that are manufacturers) to the share of small businesses in the nation that are in that category. As a result, underrepresented firms are up weighted and overrepresented businesses are down weighted. We iterate this process several times for each stratum in order to derive a sample weight for each respondent. This weighting methodology was developed in collaboration with the National Opinion Research Center (NORC) at the University of Chicago.

The data used for weighting come from data collected by the U.S. Census Bureau. We are unable to obtain exact estimates of the combined racial and ethnic ownership of small employer firms for each state, or at the national level. To derive these figures, we assume that the distribution of small employer firm owners’ combined race and ethnicity is the same as that for all firms in a given state. Given that small employer firms represent 99.7 percent of businesses with paid employees, we expect these assumptions align relatively closely with the true population. In addition to the main weight, state- and Federal Reserve District-specific weights are created. While the same weighting methodology is employed, the variables used differ slightly from those used to create the main weight. Estimates for Federal Reserve Districts are calculated based on all small employer firms in any state that is at least partially within a District’s boundary. Federal Reserve District-level weights are created for each district using the weighting process described above, but based on observations in the relevant states.

Race/Gender Imputation

Sixteen percent of respondents completed the survey but did not provide information on the gender, race, and/or the ethnicity of their business’s owner(s). This information is needed to correct for differences between the sample and the population data. To avoid dropping these observations, a series of statistical models is used to attempt to impute the missing data. When the models are able to predict with an average accuracy of 80 percent in out-of-sample tests, the predicted values from the models are used for the missing data. When the model is less certain, those data are not imputed and the responses are dropped. After data are imputed, descriptive statistics of key survey questions with and without imputed data are compared to ensure stability of estimates. In the final sample, 13 percent of observations have imputed values for either the gender, race, or ethnicity of a firm’s ownership. To impute for owners’ race and ethnicity, a series of logistic regression models is used that incorporate a variety of firm characteristics, as well as demographic information on the business headquarter’s zip code. First, a logistic regression model is used to predict if business owners are members of a minority group. Next, for firms classified as minority-owned, a logistic probability model is used to predict whether the majority of a business’s owners are of Hispanic ethnicity. Finally, the race for the majority of business owners is imputed separately for Hispanic and non-Hispanic firms using a multinomial logistic probability model.

A similar process is used to impute for the gender of a business’s ownership. First, a logistic model is used to predict if a business is primarily owned by men. Then, for firms not classified as men-owned, another model is used to predict if a business is owned by women or is equally owned.