Santa Clara County, California/Social distancing protocol import

From OpenStreetMap Wiki
Jump to navigation Jump to search

Volunteers affiliated with Open Source San José (formerly Code for San José) are carefully importing tens of thousands of business facilities based on social distancing protocols that business owners have filed with the Santa Clara County Public Health Department under COVID-19 public health orders.

information sign

Goal

This import will revamp OSM's coverage of retail, commercial, and industrial points of interest in the South Bay. We envision bringing fresher, more comprehensive POI coverage to OSM than most proprietary datasets currently offer.

Until now, local mappers have largely collected POIs ad-hoc through field surveying and armchair mapping from Mapillary footage. A 2017 analysis found this coverage to be uneven across business categories compared to the local business telephone directory. There are also concerns that this coverage underrepresents minority-owned businesses and small businesses.

Since the COVID-19 pandemic began, most POI data has been at risk of going stale due to temporary or permanent closures or changes in opening hours or services. Various city and neighborhood business associations have compiled listings of members that are open for business, but these listings are skewed toward certain kinds of businesses, and the copyright situation is unclear (or at least not clear enough to rely on in OSM).

The Santa Clara County Public Health Department is keeping track of businesses that are opening for business during the pandemic, along with their contact information and self-declared compliance with local COVID-19 safety orders. We expect these business names and addresses to have high accuracy. These POIs will form a solid foundation so that, after the pandemic, we can continue to build upon it incrementally through field surveying and other means.

Schedule

We have not yet developed a timetable for completing this import. Our only concrete deadline is that we want to complete this import before the pandemic subsides to the point where the public health department no longer requires businesses to submit SDPs.

  • September 12, 2020: Project kickoff as part of Code for San José's local edition of Code for America's National Day of Civic Hacking
  • September 17: Initial test scrape of SDP directory
  • October 13: California Department of Public Health moves Santa Clara County to Tier 3; existing SDPs are invalid within 14 days
  • October 15: Draft mapping of business categories to iD presets and/or tags
  • November 9: Proposal drafted on the wiki
  • November 16: Scrape of SDP directory at high water mark before move to Tier 1
  • November 17: California Department of Public Health moves Santa Clara County to Tier 1; many businesses with SDPs must close, but unclear if remaining essential businesses will file revised SDPs
  • November 19: Request for comments posted to the talk-us-sfbay, imports-us, and imports mailing lists and the #imports channel on OSMUS Slack
  • November 26: Smallest MapRoulette challenges open for mapping
  • December 3: Largest MapRoulette challenges open for mapping

The import may take weeks to complete, depending on the number of participants.

Source

This import pulls together three data sources:

Social distancing protocols

The geocoded SDPs are concentrated in the populated parts of Santa Clara County, especially in retail areas along major arterial roads.

The Santa Clara County Public Health Department has created a non-machine-readable database of businesses and institutions that have submitted social distancing protocols (SDPs). Under an October 5 public health order, all businesses and institutions must file a social distancing protocol with the department by October 27 to stay open. As of November 16, 2020 (the last day at Tier 3), the database includes 20,682 SDPs, with caveats:

  • Many establishments are either on-site services that lack a physical address or home businesses that do not accept customer visits. We plan to identify and omit these establishments.
  • An establishment can file multiple SDPs, each SDP superseding the previous one. The database generally excludes superseded SDPs, but it is up to the business to mark replacement SDPs; the database does not guarantee uniqueness.

Excluding establishments without physical addresses, 14,866 SDPs had enough information to import when the import began. Since then, many more SDPs have been added to the dataset.

Other datasets

The county also maintains a address point dataset containing addresses throughout the county. We are using this dataset to geocode the address in each SDP.

The Santa Clara Valley Transportation Authority, a special district covering Santa Clara County, publishes a land use dataset that we are using to provide a hint to mappers as to whether a geocoded POI is in a residential or nonresidential area.

License

The SDP listing, address point dataset, and land use dataset are all compiled by California local government agencies and are therefore in the public domain. (The County of Santa Clara was the defendant in a landmark case before a California appellate court that resulted in such works being in the public domain.)

public domain iconSeal of California

This work is in the public domain in the United States because it is a work of the State of California that was in any way "involved in the governmental process" and "prepared, owned, used or retained by any state or local agency" or officer. That work is available pursuant to court interpretation of the Sunshine Amendment of the Constitution of California, and/or the California Public Records Act (CPRA), which contained no relevant provision(s) for copyright.

It is not copyrighted because (lacking an exception in statute like those for works of the Department of Toxic Substances Control or works of certain colleges established by statute) "unrestricted disclosure is required".

See County of Santa Clara v. CFAC. In brief, the "CPRA contains no provisions either for copyrighting [this work] or for conditioning its release on an end user or licensing agreement by the requester. The record thus must be disclosed as provided in the CPRA, without any such conditions or limitations." Subject to general disclaimers.

This template should only be used on file pages.

Preparation

Scraping

The county has not published a structured dataset corresponding to this listing, so we are resorting to scraping the SDP website to reconstruct the most relevant parts of the dataset. We have also scraped HTTP response headers from the submitted form PDFs, which contain additional fields that aren't displayed in HTML format. These additional fields will help us choose precise tags.

The SDP site is being updated every day. After the initial import kicks off, we will periodically rescrape the website for new listings. Business owners have the option to resubmit an SDP, replacing the previous submission. We will deduplicate submissions by hashing the name and address of each submission.

Geocoding

The SDP listing includes hand-entered addresses but no coordinates. To forward-geocode the addresses to coordinates, we loaded addresses from the county's address point dataset into Pelias. This may be related to the dataset that we are currently using in the San José building import.

Post-processing

To assist mappers, extra fields are added to the geocoded points. A QGIS script correlates the points with a zoning map, to indicate which locations may actually be in residential zones; measures the distance of each point to nearby relevant OSM features, to judge how likely the neighborhood is already well-mapped (and thus lower-priority); and splits the list into multiple layers depending on the type of business, so each category can be made into a separate MR challenge.

Tagging

This table summarizes how each category in the SDP listing corresponds to one or more iD presets and feature tags. Some categories are quite broad, so participants in this import will need to choose between multiple presets on a case-by-case basis.

The "Other, please specify" category is particularly challenging. Business owners have the opportunity to clarify the line of business in a freeform field, which we have scraped from metadata attached to the submitted PDFs. We have not yet chosen an efficient way to associate the freeform responses with tags.

Aside from feature tags, the following secondary tags will generally appear on imported features:

We previously considered the following tags but decided against them:

  • opening_hours:covid19=open – Month-by-month fluctuation between tiers means data consumers could not be confident about the accuracy of this tag.
  • If "Facility/Worksite visited by public" is "NO", some categories like "Construction" could be tagged access=private; for others, that will be a signal that the business should not be mapped. However, the responses seemed to be unreliable in a spot-check, and extracting the field from the PDFs would have been challenging.
  • safety:hand_sanitizer:covid19=* – The "Hand sanitizer and/or soap and water are available at or near the site entrance…" checkbox corresponds well to this tag, but we expect most POIs to provide hand sanitizer, and extracting the field from the PDFs would have been challenging.

We do not plan to tag POIs with the phone numbers listed on the SDP website. At a glance, many of the phone numbers appear to be personal cell phones of store managers or compliance staff.

Results

We will upload this series of GeoJSON files to MapRoulette, one challenge per category. Before we upload the GeoJSON files, we will join them with the business type descriptions.

Workflow

This MapRoulette project contains one challenge per business category. Each challenge consists of one business facility per task. The task's instructions will suggest presets or tags to choose from. This document provides more detailed guidance. The largest challenges will be hidden initially while we make sure the workflow runs smoothly with the smaller categories.

Some challenges require more manual mapping because businesses in the category are not reliably mappable. For example, the "Alternative Non-hotel Guest Accommodations" category included many home Airbnbs before the SDP form was revised in October. Many of the listings in the "Construction" category are for work sites that may normally be a different business (example).

The mapper is responsible for conflating the imported POI with nearby existing data and spot-checking the business against available aerial or street-level imagery to verify that the business is not obviously a home office. We are investigating providing additional signals as part of the per-task instructions for cases where street-level imagery is unavailable. Since Pelias's handling of unit numbers is limited, the mapper should also try to refine the business's location if it lies inside a strip mall or office building.

Changesets will credit source=Santa Clara County Public Health Department along with any imagery_used=* added by iD. Changeset comments will include the hashtags #c4sj, #South-Bay-OSM, and #maproulette.

Various followup tasks will be possible outside the import. For example, we can review POIs last edited before the beginning of the import to see if they still exist (even in pre-pandemic street-level imagery).

Participants

The following Open Source San José volunteers are leading the effort to import SDPs:

This MapRoulette leaderboard shows who has contributed to the import's challenges. We encourage anyone in the local community to help us import the SDPs.

Statistics

A spot-check of the SDP website as of November 23 compared to the latest County Business Patterns data from the Census Bureau shows that an import of the SDP database could make significant progress towards our POI coverage in the South Bay:

NAICS 2017 NAICS description Establishments (Census Bureau, 2018) SDP category SDPs
11 Agriculture, forestry, fishing and hunting 27 Agriculture 49
31–33 Manufacturing 2168 Manufacturing 718
441
8111
Motor vehicle and parts dealers
Automotive repair and maintenance
1286 Bicycle and Auto Repair/Supply
Car Detailing
259
442
443
444
446
448
451
452
453
454
Furniture and home furnishings stores
Electronics and appliance stores
Building material and garden equipment and supplies dealers
Health and personal care stores
Clothing and clothing accessories stores
Sporting goods, hobby, musical instrument, and book stores
General merchandise stores
Miscellaneous store retailers
Nonstore retailers
3155 Retail 2038
445 Food and beverage stores 751 Grocery Stores and Other Non-restaurant Food Facilities 509
447 Gasoline stations 288 Gas stations 179
505 Depository credit intermediation 505 Banks and Other Financial Institutions 383
53 Real estate and rental and leasing 2880 Real estate 727
54 Professional, scientific, and technical services 8892 Professional Services (Legal, Accounting, etc.) 742
6244 Child day care services 733 Childcare 455
71394
71395
Fitness and recreational sports centers
Bowling centers
315 Gyms and Indoor Sports/Fitness Facilities 387
7211 Traveler accommodation 274 Alternative Non-hotel Guest Accommodations (e.g., AirBnB)
Hotels/Motels
164
7224 Drinking places (alcoholic beverages) 143 Bars and nightclubs 57
72251 Restaurants and other eating places 4164 Restaurant 3279
81211 Hair, nail, and skin care services 980 Hair Salons and Barbershops
Nail Salons
1095
81219 Other personal care services 166 Misc. Personal Services (Massage, Waxing, Tanning, Tattoo, skincare, piercing, etc.) 347
8123 Drycleaning and laundry services 210 Laundromats, Laundry Services, and Drycleaning 115
8131 Religious organizations 522 Religious Institutions 380

See also

External links