Homework 5 -- NYC Housing and School Analysis (Course Project Part 2)
Skills: 1, 2
Due
Thursday, October 9, 2025 at 6PM (Oakland) or 9PM (Boston)
Introduction
Building on your work with housing data from Homework 4, you'll now analyze the relationship between NYC housing patterns and school demographics by working with 2 datasets. You'll be working with two CSV files:
- housing.csv, which you worked with in Homework 4 and has the following columns:
- borough: The NYC borough (Manhattan, Brooklyn, Queens, Bronx, Staten Island)
- zipcode: ZIP code of the property
- address: Street address of the property
- landuse: Type of land use (residential, commercial, etc.)
- ownertype: Category of property ownership
- lotarea: Total lot area in square feet
- bldgarea: Total building area in square feet
- latitude: Latitude coordinate of the property
- longitude: Longitude coordinate of the property
- schools.csv, which represents NYC's public schools and has the following columns:
- schoolname: Name of the school
- latitude: Latitude coordinate of the school
- longitude: Longitude coordinate of the school
- address: Street address of the school
- city: City in which the school is located
- zip: ZIP code of the school
- percentasian, percentblack, percenthispanic, percentblackhispanic, percentwhite: Demographics (as percentage strings like "16%")
Problem 1
Reuse your code from Homework 4 to create housing-table
. Use it as a model to define a second table called schools-table
with the appropriate column names, types, and values for the NYC schools data.
Problem 2
Part A
Find which ZIP codes have both housing and schools to identify complete neighborhoods. Extract the "zipcode" column from housing-table
into a list called housing-zips
and the "zip" column from schools-table
into a list called school-zips
. Create unique-housing-zips
and unique-school-zips
containing only distinct ZIP codes from each dataset.
Part B
Design a function has-both-housing-and-schools
that takes a ZIP code and returns true if it appears in both ZIP code lists. Use this function with the appropriate list operation to create complete-neighborhoods
containing ZIP codes that have both housing and schools.
Problem 3
Analyze residential building sizes in areas with schools by filtering housing-table
to create residential-properties
containing only properties with landuse codes "1", "2", or "3" (residential buildings). Extract the ZIP codes from these residential properties into residential-zips
. Use the appropriate list operation to create residential-zips-with-schools
containing only residential ZIP codes that also appear in your school ZIP codes list (unique-school-zips
) from Problem 2.
Problem 4
In this problem we will calculate distances; we will approximate by using the standard distance formula (distance between two points, latitude,longitude, is the square root of the difference of latitudes squared plus the difference of longitudes squared). This assumes that the earth is flat, but at close enough distances, is a reasonable approximation.
Part A
First, we want to find the geographic center of all housing and all schools -- the average
latitude and average longitude. Extract the latitude
and longitude
fields from both
datasets, and then calculate:
center-housing-lat
: Average latitude of all housingcenter-housing-lon
: Average longitude of all housingcenter-school-lat
: Average latitude of all schoolscenter-school-lon
: Average longitude of all schools
Part B
Design a function simple-distance
that takes a latitude and longitude and another latitude and another longitude and returns the distance between, using the distance formula. It should return an ExactNum, so use num-exact
on the result of num-sqrt
.
Now design a function find-closest-school
that, given a latitude and
longitude, identifies the closest school that is the closest to the given
latitude and longitude. You probably want to use build-column
, to add a
distance to every school, and then order-by
, to sort by that distance, before
extracting the closest (i.e., first) row.
Problem 5
Because of the city’s growing population, the city’s Department of Education is considering adding new schools. The city currently has the funds to be able to build two new schools and has asked your office to recommend good locations. As you can imagine, choosing a “good” location for a school is complex and requires careful consideration of several factors, including environmental ones.
Part A.
Based on the stakeholder matrix that you completed in Homework 4, identify the data you’ll need to have, in order to determine two different good locations for new schools. Explain your answer in two or three sentences.
Part B
In two or three sentences, explain whether that data is already given, or can be easily inferred, from what is in the stakeholder matrix. For example: do any of the columns in your matrix provide it? If not, then how might you get that data?