Working with GTFS Realtime Transit Data
Decoding GTFS Realtime transit data can be a challenging endeavour for simple front end development. By using protobuf.js, AWS Lambda and Serverless, we can make the data accessible for web application and visualizations.
Realtime transit data is a powerful data source that can help operators and users interact with and understand transit systems. General Transit Feed Specification(GTFS) is the most common standardized format for public transit schedules and geographic information. Developed by Google and Trimet in 2005, the standard has been adopted by thousands of transit agencies around the world.
GTFS data is separated into two categories: static files and realtime data. Static GTFS represent the scheduling and geographic information (stop locations, route shapes) of public transportation and are simply packaged as several CSV files. They can be considered as relational database tables that contain information on routes, stop times, trips and stop locations. GTFS Realtime is an extension to GTFS and is a feed specification that provides information on:
- Trip updates (delays, cancellations, route changes)
- Service alerts - (stop moved, scheduled or unforeseen effects to transit scheduling)
- Vehicle positions - (geographic location of vehicle)
Accessing GTFS data and using it for front end development turned out to be a lot more difficult than I expected. Firstly, GTFS Realtime data feeds are encoded as protocol buffers. If you haven’t worked with protocol buffers (also called protobufs) before, you’ll need to do a little bit of research to find out how they are used. Again developed by Google, they are a system for serialization (like XML or JSON) used for data communication and storage. Protobufs are useful because they encode and compress data into a binary stream making it very light weight and quick to transfer. Protobufs are generated by first defining the structure of the data in a proto file and then using that file to to compile a protobuf message. The same proto file must be used to decode the message.
Many transit agencies release their GTFS Relatime protobuf files as part of their open data systems. The files are updated periodically depending on the agencies refresh rate. The difficult part is accessing that data in a front end development environment.
You many know already, that I’m a big fan of React and I’ve been experimenting with it in some of my recent projects. The tricky part for me was decoding the GTFS Realtime protobufs in the browser so they can be used for some useful purpose. Google has published a github repository of bindings that decode/encode GTFS Realtime protobufs for use in several languages (Node, Python, Java, etc.). Unfortunately, I found the bindings to be more trouble than they were worth. I found it a lot easier to use the ‘generic’ protobuf.js package when working with the GTFS Realtime protobufs. But in the end it didn’t even matter because when I fetched the protobuf to decode using protobuf.js I got the dreaded CORS error:
Cross-Origin Request Blocked: The Same Origin Policy
disallows reading the remote resource at http://localhost:5000/.
(Reason: CORS header ‘Access-Control-Allow-Origin’ missing).
MDN Web docs has an excellent article on Cross-Origin Resource Sharing (CORS):
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin…For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts. For example, XMLHttpRequest and the Fetch API follow the same-origin policy. This means that a web application using those APIs can only request HTTP resources from the same origin the application was loaded from, unless the response from the other origin includes the right CORS headers.
Cross-Origin Resource Sharing (CORS) is a mechanism that uses additional HTTP headers to tell a browser to let a web application running at one origin (domain) have permission to access selected resources from a server at a different origin…For security reasons, browsers restrict cross-origin HTTP requests initiated from within scripts. For example, XMLHttpRequest and the Fetch API follow the same-origin policy. This means that a web application using those APIs can only request HTTP resources from the same origin the application was loaded from, unless the response from the other origin includes the right CORS headers.
So, the CORS error occurred because the response (GTFS Realtime protobuf) does not have the correct CORS header. There’s no real way to get around this other to have the originating response authorize the domain running the web application - an unlikely scenario.
So we need to find a way to retrieve the protobuf response and host it on a web server where we can give the response the correct CORS headers. That’s usually a cumbersome task but this is were AWS Lambda and Serverless framework come in. Serverless computing and Function as a Service (FaaS) platforms seems to be the latest buzz these days and for good reason. Lambda is a service that allows you run code on servers on demand. The Serverless framework just helps in a deploying functions to various FaaS providers. This overcomes the hassle of provisioning and managing servers for simple tasks.
Using the above method, I was finally able to decode and deploy the GTFS Realtime protobuf as JSON content that could easily be consumed in a web app or visualization. If you’re interested, the Serverless code can be found here on github.