Addition of Application and Analysis Guide by JoeJimFlood · Pull Request #1086 · ActivitySim/activitysim

JoeJimFlood · 2026-06-25T16:59:40Z

This pull request adds the first iteration of the ActivitySim Application and Analysis guide. It includes guides on how to perform three scenarios that are under the existing user guide:

Land Use Change: The addition of a transit-oriented development along a light rail line
Network Change: The addition of a BRT line
Telecommute Change: A return to the office scenario

…in repop mode

…e calculation as it's not necessary

guyrousseau · 2026-06-25T17:48:08Z

Thanks @JoeJimFlood, job well done. Following up on what I sent you earlier this week in reference to this TOD scenario:

How are the (increased) willingness and propensity to ride transit reflected in this TOD scenario, from a travel behavior sensitivity perspective? Perhaps in the utility expression calculations? In tour mode choice and trip mode choice utility functions?

How does this TOD scenario determine the impact of TOD on Single-Occupant Vehicle trips, in terms of significance?

Thanks for the comments @guyrousseau:

The scenario currently doesn't describe editing any of the utility expressions, but as the residents of the new development follow characteristics of TOD it could be expected that they'd be more likely to take transit, etc.

I can add that to the metrics that are calculated.

mmilkovits · 2026-06-26T20:18:44Z

Hi Joe - great example, would be something to work in "Barbenheimer" but that may be pushing it.

I had a couple of thoughts on potential extensions to the guide:

It could be useful to highlight specific QA/QC steps - such as how to verify that the population change was made correctly
To demonstrate the ABM's value and reiterate the model operation, report metrics from model components in their order of operation (e.g., examine vehicle availability, work location, DAP, etc.). VMT and mode share are obviously important and familiar to modelers accustomed to trip-based models, but I think showing more of the ABM details will help demonstrate the added value
I think something on judging the reasonableness of the model response could be useful here (speaking to a higher level audience). The point could be made by comparing with other areas, comparing across market segments, or between existing and new households.

JoeJimFlood · 2026-06-29T21:38:50Z

Hi Joe - great example, would be something to work in "Barbenheimer" but that may be pushing it.

I had a couple of thoughts on potential extensions to the guide:

It could be useful to highlight specific QA/QC steps - such as how to verify that the population change was made correctly

To demonstrate the ABM's value and reiterate the model operation, report metrics from model components in their order of operation (e.g., examine vehicle availability, work location, DAP, etc.). VMT and mode share are obviously important and familiar to modelers accustomed to trip-based models, but I think showing more of the ABM details will help demonstrate the added value

I think something on judging the reasonableness of the model response could be useful here (speaking to a higher level audience). The point could be made by comparing with other areas, comparing across market segments, or between existing and new households.

@mmilkovits Thanks for the comments! I'm now thinking of maybe sneaking a covert reference or two into it...

I can add a brief section on that.
I'm concerned that adding an output from every step would make the analysis section too long, especially as many wouldn't change from the baseline run. Maybe I could add a summaries of a few keys steps that aren't in trip-based models.
I'll incorporate that into the summary at the end.

joecastiglione · 2026-06-30T01:04:57Z

+
+## Introduction
+
+Many contemporary urban planners are encouraging developers to build denser housing, particularly around transit stops. Naturally, planners will want to gauge what the impact of such a development would be on their jurisdiction's transportation system, particularly regarding metrics such as VMT (and subsequently greenhouse gas emission) and transit boardings (and subsequently farebox revenue). To demonstrate this, we will be analyzing a hypothetical development in the San Diego Region. The particular development will add 2000 households and 1000 retail jobs in the vicinity of the Grossmont Station on the Green and Orange Lines of San Diego's light rail system, where there is an existing auto-oriented shopping mall. The guide will show how to make changes to the ActivitySim inputs, how to run the test, and how to calculate some of the key metrics such as VMT and changes in mode share.


If this is intended to be a hypothetical example (as later text makes clear), then I would drop references to specific places. For example, just say, "we will be analyzing a hypothetical mixed use development near transit"

joecastiglione · 2026-06-30T01:04:59Z

+
+## Setting Up the Scenario
+
+Three input files need to be changed in order to run this test: the land use file (landuse.csv) and the files defining the synthetic population (households.csv and persons.csv). While updating the land use file may seem very straightforward, it is very easy to overlook some necessary changes that could result in the model understating the impact of the change. A modeler doesn't need to just edit the household and employment fields in the study area--they also need to edit any field derived from those fields. For example, every new household will have at least one person in it, so the population field will need to be updated as well (along with population density if that is present). If the total population is to be kept the same, households will need to be removed outside of the study area as well.


Is the file landuse.csv or land_use.csv?

Perhaps clarify, If the total population is to be kept the same in the entire reigonal modeling area

In general, I think it would be good to idea specific fields that need to be updated

The subsequent discussion deals with the synthetic population first, and then the land use, so perhaps reflect this order in this intro paragraph

I'm a little confused by why the intro paragraph starts to get into the weeds a bit (eg. talking about updating density in the LU field, and then discussing about how denisty is buffer-based in the SANDAG model

joecastiglione · 2026-06-30T01:05:01Z

+
+Three input files need to be changed in order to run this test: the land use file (landuse.csv) and the files defining the synthetic population (households.csv and persons.csv). While updating the land use file may seem very straightforward, it is very easy to overlook some necessary changes that could result in the model understating the impact of the change. A modeler doesn't need to just edit the household and employment fields in the study area--they also need to edit any field derived from those fields. For example, every new household will have at least one person in it, so the population field will need to be updated as well (along with population density if that is present). If the total population is to be kept the same, households will need to be removed outside of the study area as well.
+
+Because Activity-based models use synthetic populations, those input files will need to be updated to reflect the different distribution in the population. There are multiple ways that this could be done. The ActivitySim consortium maintains the PopulationSim population synthesis software, which includes a `repop` mode that can be used to add households to an existing synthetic population. This demonstration will show how to do this, though any user is welcome to add the additional households in whatever way works best for them (such as through a script).


Clarify what is meant by "the different distribution in the population" - is this referring to the geographic distribution? The demographic distribution(s)?

I find it slightly confusing to say "any user is welcome to add the additional households in whatever way works best for them (such as through a script)". I understand the challenge that there could be any number of ways to adjust a population, but rather than just say "through a script", maybe lay out some options like, "you can repop, you can shift locations of existing synth pop, you can generate an entirely new pop"

joecastiglione · 2026-06-30T01:05:06Z

+
+1. The first thing to do would be to update the new synthetic population. This can be done using PopulationSim's repop mode, which adds additional population on top of existing PopulationSim outputs (a pipeline file is needed). A more detailed explanation of PopulationSim's repop mode can be found in [PopulationSim's documentation](https://activitysim.github.io/populationsim/application_configuration.html#configuring-settings-file-for-repop-mode), but this guide will briefly provide some examples of how PopulationSim can be configured for this particular scenario.
+
+First, the `run_list` in the settings file needs to be adjusted to have PopulationSim run the repop steps:


Mention the specific name of the settings file. Although individual implementations may vary in terms of the file names, field names, etc, to the extent possible try to make explicit.

joecastiglione · 2026-06-30T01:05:14Z

+    - write_tables.repop
+```
+
+Next, `repop_control_file_name: repop_controls.csv` should be added to the settings file. This tells PopulationSim which file to configure what the control totals will be within the configs directory (configs\repop_controls.csv). The following configuration will help control for characteristic of the population within a TOD area. For example, TOD is more likely to attract smaller households who are more likely to be workers, more likely to be held by younger adults, and less likely to have children than the general population.


Does the user need to first create the repop_controls.csv file? How does the configuration below reflect the anticipated TOD population?

joecastiglione · 2026-06-30T01:05:17Z

+```
+
+Next, `repop_control_file_name: repop_controls.csv` should be added to the settings file. This tells PopulationSim which file to configure what the control totals will be within the configs directory (configs\repop_controls.csv). The following configuration will help control for characteristic of the population within a TOD area. For example, TOD is more likely to attract smaller households who are more likely to be workers, more likely to be held by younger adults, and less likely to have children than the general population.
+| target     | geography | seed_table | importance | control_field | expression                                         |


Maybe explain why there are no entries in this table for HHSize_3 or greater, no age controls for popualtions <18 or >54 (how does that work exactly?)?

joecastiglione · 2026-06-30T01:05:21Z

+| Age_35to44 | mgra      | persons    | 100000     | Age_35to44    | (persons.AGEP >= 35) & (persons.AGEP <= 44)        |
+| Age_45to54 | mgra      | persons    | 100000     | Age_45to54    | (persons.AGEP >= 45) & (persons.AGEP <= 54)        |
+
+The totals in each of the zones are then defined in the control total file, which can be defined in the settings file as follows:


What is the name of the control total file? Where is it located? What is the difference between a filename and a tablename?

joecastiglione · 2026-06-30T01:05:25Z

+| 579  | 500      | 250      | 150      | 50       | 240      | 120      | 400       | 150        | 200        | 200        | 150        |
+| 4502 | 500      | 250      | 150      | 50       | 240      | 120      | 400       | 150        | 200        | 200        | 150        |
+
+The output synthetic population files then need to be placed in the `data` directory for the ActivitySim run.


Meaning households.csv and persons.csv? The more explicit you are, the better.

joecastiglione · 2026-06-30T01:05:28Z

+
+The output synthetic population files then need to be placed in the `data` directory for the ActivitySim run.
+
+2. The land use file now needs to be updated to reflect the updated population. These lines of code update the household and population values within the land use file (assuming that the synthetic population exists as data frames called `households` and `persons` and the land use file is a data frame called `land_use`).


Maybe you should provide instruction how someone can have acccess to the required dataframes?

joecastiglione · 2026-06-30T01:05:34Z

+land_use["pop"] = persons.groupby("home_zone_id").count()["person_id"]
+del persons["home_zone_id"]
+```
+Further variables will need to be edited as well. For example, the SANDAG example contains variables on the number of housing units in each MAZ, including those that are vacant (so this will be higher than the number of households). There are also variables for population density. It may initially seem that one could just calculate them by dividing the updated population by the total area. While some models may use population density calculated in that way, the population density variables in the SANDAG model are actually the population within a buffer and are calculated via a preprocessing step that's external to ActivitySim (which should be rerun in this particular scenario). One should pay close attention to how each of the variables are defined to reduce the risk of a misunderstanding causing incorrect model results.


I think you should explicitly list the fields that need to be updated given this example (acknowledging that other models may have different data items available, different naming conventions, etc).

joecastiglione · 2026-06-30T01:05:40Z

+
+3. The land use data needs to again be adjusted for the retail jobs. This is overall more straightforward than adjusting the population, as ActivitySim models don't typically have a set of synthetic set of establishments. As this particular scenario add 1000 additional retail jobs to the TOD development, the values of the retail and total employment fields simply need to be updated for the zones within the study area:
+```
+# Adjust employment


What is this block of code supposed to be applied to? Again, I think you should err on the side of being as explicit as possible.

joecastiglione · 2026-06-30T01:06:01Z

+uv run activitysim run -c configs\common -c configs\resident -d data_full -o output --ext extensions
+```
+
+## Analyzing the Results


This list of metrics feels a bit brief, given the scope that "Ideally, the Application & Analysis Guide should also provide guidance on potential analysis metrics as well as appropriate levels of aggregation."

joecastiglione · 2026-06-30T01:06:02Z

+
+The following code blocks demonstrate how to calculate key metrics from the model outputs. They all assume that the ActivitySim output files will be read in as a data frame where the name will be the same as the file name but without the prefix or the file extension (e.g. final_trips.csv will be read as trips).
+
+### Vehicle Miles Traveled


Other metrics may be of interest, such as VMT / capita. Part of the goal of these guides is to identify a set of metrics that are relevant for the tests, and some guidance on how to interpret them, so that the guide can help both model users as well as managers understand what they should be looking at.

A key part of any metric calculation is the comparison to something else - to contextualize the metrics. Eg. Is per capita VMT greater or less than in other areas of the region? What is the percentage increase in aggregate VMT in the subarea as well as in the overall modeled area?

joecastiglione · 2026-06-30T01:06:07Z

+```
+mode_share = trips["trip_mode"].value_counts(normalize = True)
+```
+This will return the percentage of trips that use each mode. However, a single localized development won't move the needle much, so it may be hard to tell if there was an impact. The following metric computes the *tour* mode share to work of households living in zones close to the transit stop, which should show a much larger difference from the baseline (there are no households in the study area in the baseline run so the baseline mode share would be undefined).


I wouldn't think the point of this example would be to demonstrate how it moves the needle regionally, but rather, is transit mode share higher in the new development than in the region, which would be a step in the right direction.

joecastiglione · 2026-06-30T01:06:09Z

+```
+auto_ownership = households["auto_ownership"].value_counts(normalize = True)
+```
+However, auto ownership has the same issue with mode share where the TOD development will barely move the needle on the regional auto ownership rates. Therefore, a similar calculation would need to be done:


Clarify that this similar calculation is to calcualte this measure for the station area? (to compare to the region?)

JoeJimFlood added 9 commits April 24, 2026 15:06

Set up file structure, and wrote introductory text for land use test

b334b07

Added guide showing how to configure files for running PopulationSim …

f0fb466

…in repop mode

Removed reference to script that's no longer being used

9f94fd2

Restructured file and added headings to better reflect the outline

dcac4d5

Finished text of first draft of land use change scenario

de25920

Added study area map

dbac2d0

Added crediting of sattelite imagery

bd4ef4b

Added alternative to reindex function to get home zone IDs for tours

d35882e

Removed merging of household and tours table in station area tour mod…

d703a4b

…e calculation as it's not necessary

guyrousseau reviewed Jun 25, 2026

View reviewed changes

This was referenced Jun 25, 2026

2026-06-23 Product/Community Team ActivitySim/meeting-notes#112

Closed

2026-06-23 Australia Team ActivitySim/meeting-notes#113

Closed

joecastiglione reviewed Jun 30, 2026

View reviewed changes


		## Introduction

		Many contemporary urban planners are encouraging developers to build denser housing, particularly around transit stops. Naturally, planners will want to gauge what the impact of such a development would be on their jurisdiction's transportation system, particularly regarding metrics such as VMT (and subsequently greenhouse gas emission) and transit boardings (and subsequently farebox revenue). To demonstrate this, we will be analyzing a hypothetical development in the San Diego Region. The particular development will add 2000 households and 1000 retail jobs in the vicinity of the Grossmont Station on the Green and Orange Lines of San Diego's light rail system, where there is an existing auto-oriented shopping mall. The guide will show how to make changes to the ActivitySim inputs, how to run the test, and how to calculate some of the key metrics such as VMT and changes in mode share.


		## Setting Up the Scenario

		Three input files need to be changed in order to run this test: the land use file (landuse.csv) and the files defining the synthetic population (households.csv and persons.csv). While updating the land use file may seem very straightforward, it is very easy to overlook some necessary changes that could result in the model understating the impact of the change. A modeler doesn't need to just edit the household and employment fields in the study area--they also need to edit any field derived from those fields. For example, every new household will have at least one person in it, so the population field will need to be updated as well (along with population density if that is present). If the total population is to be kept the same, households will need to be removed outside of the study area as well.


		Three input files need to be changed in order to run this test: the land use file (landuse.csv) and the files defining the synthetic population (households.csv and persons.csv). While updating the land use file may seem very straightforward, it is very easy to overlook some necessary changes that could result in the model understating the impact of the change. A modeler doesn't need to just edit the household and employment fields in the study area--they also need to edit any field derived from those fields. For example, every new household will have at least one person in it, so the population field will need to be updated as well (along with population density if that is present). If the total population is to be kept the same, households will need to be removed outside of the study area as well.

		Because Activity-based models use synthetic populations, those input files will need to be updated to reflect the different distribution in the population. There are multiple ways that this could be done. The ActivitySim consortium maintains the PopulationSim population synthesis software, which includes a `repop` mode that can be used to add households to an existing synthetic population. This demonstration will show how to do this, though any user is welcome to add the additional households in whatever way works best for them (such as through a script).


		1. The first thing to do would be to update the new synthetic population. This can be done using PopulationSim's repop mode, which adds additional population on top of existing PopulationSim outputs (a pipeline file is needed). A more detailed explanation of PopulationSim's repop mode can be found in [PopulationSim's documentation](https://activitysim.github.io/populationsim/application_configuration.html#configuring-settings-file-for-repop-mode), but this guide will briefly provide some examples of how PopulationSim can be configured for this particular scenario.

		First, the `run_list` in the settings file needs to be adjusted to have PopulationSim run the repop steps:


		The output synthetic population files then need to be placed in the `data` directory for the ActivitySim run.

		2. The land use file now needs to be updated to reflect the updated population. These lines of code update the household and population values within the land use file (assuming that the synthetic population exists as data frames called `households` and `persons` and the land use file is a data frame called `land_use`).


		The following code blocks demonstrate how to calculate key metrics from the model outputs. They all assume that the ActivitySim output files will be read in as a data frame where the name will be the same as the file name but without the prefix or the file extension (e.g. final_trips.csv will be read as trips).

		### Vehicle Miles Traveled

Uh oh!

Conversation

JoeJimFlood commented Jun 25, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmilkovits commented Jun 26, 2026

Uh oh!

JoeJimFlood commented Jun 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants