Skip to content
Draft
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# ActivitySim Application and Analysis Guide

This guide is to demonstrate to modelers how to apply ActivitySim for the analysis of various projects. It is intended to both demonstrate exactly how to change the inputs to test a particular project or policy, process the outputs to best answer the question that was asked, and provide a general understanding of what ActivitySim can and can't do.

There are presently three example scenarios using the [SANDAG ABM3 Example model](https://github.com/activitysim/sandag-abm3-example), though more may be added in the future. Before running all of them, it is recommended to run the SANDAG example, as that will download the full data and provide a baseline run to compare each scenario to. For each example scenario, a step-by-step guide for changing the inputs along with notebooks demonstrating how to calculate key metrics from the model outputs.

## Example scenarios
[Land Use Change](land-use-change\land_use_change.md)
[Network Change](network-change\network_change.md)
[Telecommuting Change](telecommute-change\telecommute_change.md)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JoeJimFlood, job well done. Following up on what I sent you earlier this week in reference to this TOD scenario:

  • How are the (increased) willingness and propensity to ride transit reflected in this TOD scenario, from a travel behavior sensitivity perspective? Perhaps in the utility expression calculations? In tour mode choice and trip mode choice utility functions?
  • How does this TOD scenario determine the impact of TOD on Single-Occupant Vehicle trips, in terms of significance?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @guyrousseau:

  • The scenario currently doesn't describe editing any of the utility expressions, but as the residents of the new development follow characteristics of TOD it could be expected that they'd be more likely to take transit, etc.
  • I can add that to the metrics that are calculated.

Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Land Use Change Guide

## Introduction

Many contemporary urban planners are encouraging developers to build denser housing, particularly around transit stops. Naturally, planners will want to gauge what the impact of such a development would be on their jurisdiction's transportation system, particularly regarding metrics such as VMT (and subsequently greenhouse gas emission) and transit boardings (and subsequently farebox revenue). To demonstrate this, we will be analyzing a hypothetical development in the San Diego Region. The particular development will add 2000 households and 1000 retail jobs in the vicinity of the Grossmont Station on the Green and Orange Lines of San Diego's light rail system, where there is an existing auto-oriented shopping mall. The guide will show how to make changes to the ActivitySim inputs, how to run the test, and how to calculate some of the key metrics such as VMT and changes in mode share.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If this is intended to be a hypothetical example (as later text makes clear), then I would drop references to specific places. For example, just say, "we will be analyzing a hypothetical mixed use development near transit"


![A map of the Study Area. The study area (Grossmont Center) is highlighted in red and the location of the nearby Grossmont Trolley Station is highlighted](LandUseStudyAreaMap.png)
*Satellite imagery from Google*

**NOTE: The example provided is a hypothetical project that demonstrates how one would use ActivitySim to model the effects of a land use change and does not necessarily reflect any real planned developments.**

## Setting Up the Scenario

Three input files need to be changed in order to run this test: the land use file (landuse.csv) and the files defining the synthetic population (households.csv and persons.csv). While updating the land use file may seem very straightforward, it is very easy to overlook some necessary changes that could result in the model understating the impact of the change. A modeler doesn't need to just edit the household and employment fields in the study area--they also need to edit any field derived from those fields. For example, every new household will have at least one person in it, so the population field will need to be updated as well (along with population density if that is present). If the total population is to be kept the same, households will need to be removed outside of the study area as well.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is the file landuse.csv or land_use.csv?
  • Perhaps clarify, If the total population is to be kept the same in the entire reigonal modeling area
  • In general, I think it would be good to idea specific fields that need to be updated
  • The subsequent discussion deals with the synthetic population first, and then the land use, so perhaps reflect this order in this intro paragraph
  • I'm a little confused by why the intro paragraph starts to get into the weeds a bit (eg. talking about updating density in the LU field, and then discussing about how denisty is buffer-based in the SANDAG model


Because Activity-based models use synthetic populations, those input files will need to be updated to reflect the different distribution in the population. There are multiple ways that this could be done. The ActivitySim consortium maintains the PopulationSim population synthesis software, which includes a `repop` mode that can be used to add households to an existing synthetic population. This demonstration will show how to do this, though any user is welcome to add the additional households in whatever way works best for them (such as through a script).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Clarify what is meant by "the different distribution in the population" - is this referring to the geographic distribution? The demographic distribution(s)?
  • I find it slightly confusing to say "any user is welcome to add the additional households in whatever way works best for them (such as through a script)". I understand the challenge that there could be any number of ways to adjust a population, but rather than just say "through a script", maybe lay out some options like, "you can repop, you can shift locations of existing synth pop, you can generate an entirely new pop"


### Instructions

1. The first thing to do would be to update the new synthetic population. This can be done using PopulationSim's repop mode, which adds additional population on top of existing PopulationSim outputs (a pipeline file is needed). A more detailed explanation of PopulationSim's repop mode can be found in [PopulationSim's documentation](https://activitysim.github.io/populationsim/application_configuration.html#configuring-settings-file-for-repop-mode), but this guide will briefly provide some examples of how PopulationSim can be configured for this particular scenario.

First, the `run_list` in the settings file needs to be adjusted to have PopulationSim run the repop steps:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Mention the specific name of the settings file. Although individual implementations may vary in terms of the file names, field names, etc, to the extent possible try to make explicit.

```
run_list:
steps:
- input_pre_processor.repop
- repop_setup_data_structures
- initial_seed_balancing.final=true;repop
- integerize_final_seed_weights.repop
- repop_balancing
# expand_households options are append or replace
- expand_households.repop;append
- summarize.repop
- write_synthetic_population.repop
- write_tables.repop
```

Next, `repop_control_file_name: repop_controls.csv` should be added to the settings file. This tells PopulationSim which file to configure what the control totals will be within the configs directory (configs\repop_controls.csv). The following configuration will help control for characteristic of the population within a TOD area. For example, TOD is more likely to attract smaller households who are more likely to be workers, more likely to be held by younger adults, and less likely to have children than the general population.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the user need to first create the repop_controls.csv file? How does the configuration below reflect the anticipated TOD population?

| target | geography | seed_table | importance | control_field | expression |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain why there are no entries in this table for HHSize_3 or greater, no age controls for popualtions <18 or >54 (how does that work exactly?)?

|------------|-----------|------------|------------|---------------|----------------------------------------------------|
| num_hh | mgra | households | 1000000000 | Total_HH | (households.WGTP > 0) & (households.WGTP < np.inf) |
| HHSize_1 | mgra | households | 250000 | HHSize_1 | households.NP == 1 |
| HHSize_2 | mgra | households | 250000 | HHSize_2 | households.NP == 2 |
| HHWork_0 | mgra | households | 100000 | HHWork_0 | households.workers == 0 |
| HHWork_1 | mgra | households | 100000 | HHWork_1 | households.workers == 1 |
| HHWork_2 | mgra | households | 100000 | HHWork_2 | households.workers == 2 |
| HHChild_0 | mgra | households | 100000 | HHChild_0 | households.HUPAC == 4 |
| Age_18to24 | mgra | persons | 100000 | Age_18to24 | (persons.AGEP >= 18) & (persons.AGEP <= 24) |
| Age_25to34 | mgra | persons | 100000 | Age_25to34 | (persons.AGEP >= 25) & (persons.AGEP <= 34) |
| Age_35to44 | mgra | persons | 100000 | Age_35to44 | (persons.AGEP >= 35) & (persons.AGEP <= 44) |
| Age_45to54 | mgra | persons | 100000 | Age_45to54 | (persons.AGEP >= 45) & (persons.AGEP <= 54) |

The totals in each of the zones are then defined in the control total file, which can be defined in the settings file as follows:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the name of the control total file? Where is it located? What is the difference between a filename and a tablename?

```
input_table_list:
- filename : repop_control_totals.csv
tablename: mgra_control_data
```

These values can be set to add a population to the study area that is characteristic of a typical transit-oriented development.
| mgra | Total_HH | HHSize_1 | HHSize_2 | HHWork_0 | HHWork_1 | HHWork_2 | HHChild_0 | Age_18to24 | Age_25to34 | Age_35to44 | Age_45to54 |
|------|----------|----------|----------|----------|----------|----------|-----------|------------|------------|------------|------------|
| 579 | 500 | 250 | 150 | 50 | 240 | 120 | 400 | 150 | 200 | 200 | 150 |
| 4502 | 500 | 250 | 150 | 50 | 240 | 120 | 400 | 150 | 200 | 200 | 150 |

The output synthetic population files then need to be placed in the `data` directory for the ActivitySim run.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning households.csv and persons.csv? The more explicit you are, the better.


2. The land use file now needs to be updated to reflect the updated population. These lines of code update the household and population values within the land use file (assuming that the synthetic population exists as data frames called `households` and `persons` and the land use file is a data frame called `land_use`).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should provide instruction how someone can have acccess to the required dataframes?

```
land_use["hh"] = households.groupby("home_zone_id").count()["household_id"]
persons["home_zone_id"] = persons["household_id"].map(households.set_index("household_id")["home_zone_id"])
land_use["pop"] = persons.groupby("home_zone_id").count()["person_id"]
del persons["home_zone_id"]
```
Further variables will need to be edited as well. For example, the SANDAG example contains variables on the number of housing units in each MAZ, including those that are vacant (so this will be higher than the number of households). There are also variables for population density. It may initially seem that one could just calculate them by dividing the updated population by the total area. While some models may use population density calculated in that way, the population density variables in the SANDAG model are actually the population within a buffer and are calculated via a preprocessing step that's external to ActivitySim (which should be rerun in this particular scenario). One should pay close attention to how each of the variables are defined to reduce the risk of a misunderstanding causing incorrect model results.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should explicitly list the fields that need to be updated given this example (acknowledging that other models may have different data items available, different naming conventions, etc).


3. The land use data needs to again be adjusted for the retail jobs. This is overall more straightforward than adjusting the population, as ActivitySim models don't typically have a set of synthetic set of establishments. As this particular scenario add 1000 additional retail jobs to the TOD development, the values of the retail and total employment fields simply need to be updated for the zones within the study area:
```
# Adjust employment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this block of code supposed to be applied to? Again, I think you should err on the side of being as explicit as possible.

land_use = land_use.set_index("MAZ")
land_use.loc[579, "emp_ret"] += 500
land_use.loc[579, "emp_tot"] += 500
land_use.loc[4502, "emp_ret"] += 500
land_use.loc[4502, "emp_tot"] += 500
land_use = land_use.reset_index() # Not necessary, but helpful if further operations use the MAZ field this could prevent an error
```
It should be noted that the same caveat applies to fields derived from employment data, such as employment density or any aggregated fields that may be present. One should be careful to update all fields that are relevant to the total employment.

## Running the Test

To run the test, run the following command line argument:
```
uv run activitysim run -c configs\common -c configs\resident -d data_full -o output --ext extensions
```

## Analyzing the Results

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of metrics feels a bit brief, given the scope that "Ideally, the Application & Analysis Guide should also provide guidance on potential analysis metrics as well as appropriate levels of aggregation."


The following code blocks demonstrate how to calculate key metrics from the model outputs. They all assume that the ActivitySim output files will be read in as a data frame where the name will be the same as the file name but without the prefix or the file extension (e.g. final_trips.csv will be read as trips).

### Vehicle Miles Traveled

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Other metrics may be of interest, such as VMT / capita. Part of the goal of these guides is to identify a set of metrics that are relevant for the tests, and some guidance on how to interpret them, so that the guide can help both model users as well as managers understand what they should be looking at.
  • A key part of any metric calculation is the comparison to something else - to contextualize the metrics. Eg. Is per capita VMT greater or less than in other areas of the region? What is the percentage increase in aggregate VMT in the subarea as well as in the overall modeled area?

While the true modeled VMT requires assignment to be run, one can get a reasonable estimate via the ActivitySim outputs. The output trips table in the SANDAG ABM3 example actually includes fields called `distance` and `weightTrip`, which are created in the preprocessor for writing the outputs (write_trip_matrices_annotate_trips_preprocessor.csv). The `distance` field is created by [reading in the distance skim value](https://github.com/ActivitySim/sandag-abm3-example/blob/main/configs/resident/write_trip_matrices_annotate_trips_preprocessor.csv#L5) and the `weightTrip` field is a weight that [factors in the occupancy](https://github.com/ActivitySim/sandag-abm3-example/blob/main/configs/resident/write_trip_matrices_annotate_trips_preprocessor.csv#L7). The following lines of code compute the VMT using those particular fields:
```
auto_modes = ["DRIVEALONE", "SHARED2", "SHARED3", "TNC_SINGLE", "TNC_SHARED", "TAXI"]
auto_trips = trips[["trip_mode", "distance", "weightTrip"]].query("trip_mode in @auto_modes")
vmt = (auto_trips["distance"] * auto_trips["weightTrip"]()).sum()
```
Now, not every ActivitySim implementation will have such a field in their outputs, so the calculation may not be as simple. If the distance field isn't added to the outputs, one will need to read in the skims in order to perform the calculation. One will also need to remember to factor in the occupancy, as an individual who is carpooling has less of an impact on VMT than a person who is driving alone.

### Mode Share
Calculating the mode share of ActivitySim is fairly straightforward as the modes are reported in the output. However, one needs to ask the questions of *which* mode share they'd like to know. For example, the simplest is the regional mode share, which can just be directly calculated from the trips file:
```
mode_share = trips["trip_mode"].value_counts(normalize = True)
```
This will return the percentage of trips that use each mode. However, a single localized development won't move the needle much, so it may be hard to tell if there was an impact. The following metric computes the *tour* mode share to work of households living in zones close to the transit stop, which should show a much larger difference from the baseline (there are no households in the study area in the baseline run so the baseline mode share would be undefined).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't think the point of this example would be to demonstrate how it moves the needle regionally, but rather, is transit mode share higher in the new development than in the region, which would be a step in the right direction.

```
station_area = [579, 4502, 8524, 7714, 12170, 12171, 5455, 8457, 846, 8232, 7831, 12172, 12173, 12174, 12175, 12176, 12177, 12178]
station_area_tours = tours[["home_maz", "tour_mode", "tour_purpose"]].query("origin in @station_area and tour_purpose == 'work'")
tour_mode_share_to_work = station_area_tours["tour_mode"].value_counts(normalize = True)
```

### Auto Ownership
The calculation of the reigional auto ownership rates is very straightforward, as that variable is reported directly in the households table:
```
auto_ownership = households["auto_ownership"].value_counts(normalize = True)
```
However, auto ownership has the same issue with mode share where the TOD development will barely move the needle on the regional auto ownership rates. Therefore, a similar calculation would need to be done:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify that this similar calculation is to calcualte this measure for the station area? (to compare to the region?)

```
station_area = [579, 4502, 8524, 7714, 12170, 12171, 5455, 8457, 846, 8232, 7831, 12172, 12173, 12174, 12175, 12176, 12177, 12178]
station_area_households = households.query("household_id in @station_area")
station_area_auto_ownership = station_area_households["auto_ownership"].value_counts(normalize = True)
```
Loading