Transit Data Bundle Guide
OneBusAway has the concept of a transit data bundle, which is a collection of all the data artifacts for a transit agency (or group of transit agencies) in the internal format needed to power OneBusAway. These transit data bundles are typically created from external data such as GTFS feeds for transit data. This document will walk you through the steps in creating a new transit data bundle.
Creating the Bundle
To create the bundle, you’ll need to download the onebusaway-transit-data-federation
application. Go to the
Downloads page to download the application.
The jar file is automatically configured to run the main class for building transit data bundles:
org.onebusaway.transit_data_federation.bundle.FederatedTransitDataBundleCreatorMain
Thus, you can simply run the builder with:
java -jar onebusaway-transit-data-federation-builder.jar ...
Note: Depending on the size of your transit network, you may need to increase the amount of memory available to the
Java VM with an argument like java -Xmx1G -jar ...
. If your Java VM supports it, you might also consider adding the
-server
argument, as it often makes Java run much faster.
Quick Configuration
By default, the builder accepts two command line options:
path/to/your/gtfs.zip
- path to your GTFS feedbundle_output_path
- the output directory where bundle artifacts will be written
This quick start mode can be used to quickly build a transit data bundle for a single transit agency, but without out much configuration flexibility. If you need more flexibility, see Advanced Config below.
Advanced Configuration
The bundle also accepts an xml file command line argument for more advanced configuration options:
bundle.xml
- path to your bundle config xml filebundle_output_path
- the output directory where bundle artifacts will be written
The configuration details for the bundle are captured in an xml file. The bundle.xml
xml config file gives you a lot
of control of how your bundle is built:
- combine multiple GTFS feeds
- override and remap GTFS agency ids
- add additional build phases
Let’s look at a quick example:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:context="http://www.springframework.org/schema/context"
xsi:schemaLocation="
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd">
<bean id="gtfs-bundles" class="org.onebusaway.transit_data_federation.bundle.model.GtfsBundles">
<property name="bundles">
<list>
<ref bean="gtfsA" />
<!-- References to other GTFS feeds could go here -->
</list>
</property>
</bean>
<bean id="gtfsA" class="org.onebusaway.transit_data_federation.bundle.model.GtfsBundle">
<property name="path" value="path/to/your/gtfs.zip" />
<property name="defaultAgencyId" value="1" />
<property name="agencyIdMappings">
<map>
<!-- Map GTFS Agency IDs to their APTA Agency Id -->
<entry key="KCM" value="1" />
<entry key="EOS" value="23" />
<entry key="ST" value="40" />
</map>
</property>
</bean>
<!-- Need a mechanism to combine stops from different feeds? -->
<bean id="entityReplacementStrategyFactory" class="org.onebusaway.transit_data_federation.bundle.tasks.EntityReplacementStrategyFactory">
<property name="entityMappings">
<map>
<entry key="org.onebusaway.gtfs.model.Stop" value="path/to/PugetSoundStopConsolidation.wiki" />
</map>
</property>
</bean>
<bean id="entityReplacementStrategy" factory-bean="entityReplacementStrategyFactory" factory-method="create"/>
</beans>
This configuration file is just a Spring bean configuration file, so you can perform arbitrarily complex configuration here. Documentation on specific advanced configuration features can be found below.
GTFS Agency Id Mapping
If you are buidling a transit data bundle with data from multiple agencies, you might wish to remap the agency ids used in each feed. Since agencies are free to pick arbitrary agency ids in their GTFS, if they specify ids at all, it is often necessary to pick ids of your own. Consider the example:
<bean id="gtfsA" class="org.onebusaway.transit_data_federation.bundle.model.GtfsBundle">
<property name="path" value="path/to/king_county_metro_gtfs.zip" />
<property name="defaultAgencyId" value="1" />
<property name="agencyIdMappings">
<map>
<!-- Map GTFS Agency IDs to their APTA Agency Id -->
<entry key="KCM" value="1" />
<entry key="EOS" value="23" />
<entry key="ST" value="40" />
</map>
</property>
</bean>
In this example, we define a GTFS bundle which points to a GTFS file through the path
property. This GTFS is from
King County Metro and includes data for King County Metro, Sound Transit, and City of Seattle. For http://onebusaway.org
we use the convention of mapping agencies to their APTA assigned id. In this case, that’s “1” for King County Metro,
“23” for City of Seattle, and “40” for Sound Transit. We map the agency ids in a two ways. First, we specify a default
agency id of “1”, which indicates that GTFS elements like stops and shapes that don’t have an agency assignment by
default will be mapped to an id of “1”. We then additionally specify an agency id mapping for the agency ids specified
in the feed.
Entity Replacement
When working with GTFS feeds from multiple agencies in the same geographic region, it is often the case that multiple feeds often refer to the same physical stop. Since GTFS doesn’t have a mechanism to indicate that two stops are the same across feeds, this can often lead to confusion in the user interface, where multiple stop icons show up on the map at the same location or a user looks for a route at one stop when they should be looking at the other.
To help with these situations, OneBusAway includes a mechanism for entity replacement that allows you to indicate that two GTFS entities are actually the same. This is most often used for stops in practice. The configuration looks like:
<bean id="entityReplacementStrategyFactory" class="org.onebusaway.transit_data_federation.bundle.tasks.EntityReplacementStrategyFactory">
<property name="entityMappings">
<map>
<entry key="org.onebusaway.gtfs.model.Stop" value="path/to/mapping_file" />
</map>
</property>
</bean>
<bean id="entityReplacementStrategy" factory-bean="entityReplacementStrategyFactory" factory-method="create"/>
Here we configure an EntityReplacementStrategyFactory
with a specific mapping for GTFS stops. The mapping refers
to a mapping file path. The mapping file has the following format:
agencyIdA_stopIdA agencyIdB_stopIdB [agencyIdC_stopIdC...]
... more entries ...
Each line contains a list of stop ids separated by spaces. The first id indicates the stop to keep, while subsequent ids indicate stops to merge into the first stop.
Other Command-Line Options
The bundle builder accepts a number of command-line options that can control the build process, allowing you to control which phases of the build are run.
-skip phase_name
- skips the specified build phase. Can be repeated.-only phase_name
- only runs the specified build phase. Can be repeated.-skipTo phase_name
- jump ahead to the specified build phase. Only specify once.-include phase_name
- include the specified build phase, useful when the phase is not enabled by default. Can be repeated.
Adding a Custom Build Phase
You can add your own custom build phases into the build process. To do so, you specify a task definition in your
bundle.xml
config:
<bean class="org.onebusaway.transit_data_federation.bundle.model.TaskDefinition">
<!-- Required -->
<property name="taskName" value="TASK_NAME_GOES_HERE" />
<!-- Optional -->
<property name="afterTaskName" value="SOME_EXISTING_TASK_NAME" />
<!-- Optional -->
<property name="beforeTaskName" value="SOME_EXISTING_TASK_NAME" />
<!-- Optional -->
<property name="enabled" value="true" />
<!-- Required -->
<property name="task" ref="taskBeanName" />
</bean>
<bean id="taskBeanName" class="your.task.Definition" />
As you can see, the TaskDefinition bean allows you to define custom tasks and optionally control where they appear
in the build order. The only requirement is that your task instance implement Runnable
.
In addition to adding your task definition to your bundle.xml
, you’ll also need to add the jar or class files with
your task implementation to the classpath when you run the build process.