“Being stuck in traffic”. Whether you are a driver or a passenger enjoying the ride, you have certainly said or heard this expression, especially near a large urban area. In addition to affecting the mood of motorists, traffic congestion has substantial impacts on the general well-being of the population, on the environment and on the economy.
There are many reasons to look into traffic congestion. For instance, exo, a collective transporter in the Greater Montréal and a major player in intelligent public transportation in Québec, is interested in traffic congestion for operational reasons, for modeling mobility or for tactical and strategic purposes.
Geospatial analysts at exo, in partnership with researchers from Polytechnique Montréal, are studying traffic congestion using historical data on vehicle circulation in the province of Québec. This data, obtained from HERE Technologies, is made up of vehicle speeds recorded on each of the road links in Québec, summarized at every 60-minute and/or 5-minute epochs. Since Québec’s Road network is divided into approximately 1.5 million directional links, HERE can provide roughly between 1 and more than 12 billion new rows per month, depending on whether 60-minute or 5-minute epochs are used, thus resulting in a monthly volume growth of more than 1 TB. It goes without saying that with such data volumes, analyzing traffic congestion is not a walk in the park!
Managing all this data in PostgreSQL, the technology initially used on premises by exo, comes with storage challenges as well as operational difficulties:
Preparing large data volumes for analysis is time consuming: notwithstanding the use of a dedicated server, this step already takes almost 48 hours each month just for speed data for 60-minute epochs.
- Querying the data subsequently, while expecting results within a reasonable time frame, requires significant computational capacities just for speed data summarized per 60-minutes epochs.
- Finding fastest or optimal routes between millions of origins and destinations, a major use case for exo analysts, can last weeks or even months.
- Based on results observed when using speed data for 60-minute epochs, exo anticipates outstanding costs for processing and using the speed for 5-minute epochs, leaving it aside for the time being.
Seeing the limitations of its current solution, exo suspects that using technologies available in the public cloud can help, but which ones?
This is where we, at Larochelle, jump in. Our role is to propose, set up and manage a modern data platform in the cloud, capable of providing the required processing capacities and which, of course, respects typical budgetary constraints.
Since exo already uses some resources in Microsoft’s cloud, we host the platform on Azure. We use various resources such as:
- Storage accounts, for archiving source files provided by HERE.
- Containerized web applications for, deploying the OSRM routine machine.
- Azure functions, for performing or orchestrating specific tasks.
For managing large databases, we use Snowflake. There are several general reasons for this choice:
- Snowflake is available on all major public clouds in North America, on Azure especially.
- Snowflake is sold as Software as a Service: as such, it greatly reduces operational tasks such as backing up or indexing the data.
- It offers the storage and computational capacities required.
- Since Snowflake offers a clear separation between storage and computational resources, several workloads can be processed simultaneously without performance degradation.
- This separation helps to better limit and control costs.
Other reasons, more specific to exo’s use cases, also explain the use of Snowflake:
It supports geographical data types and offers various geospatial functions.
- Analysts can query databases in Snowflake using the standard SQL they already know.
- Since many connectors and drivers are available, analysts can keep on working with familiar languages and tools: Python, DBeaver, Tableau just to name a few.
We set up the data platform a few months ago and, since then, exo analysts have been using it on a daily basis. Results speak for themselves:
Monthly processing of speed data for 5-minute epochs is completed within a short night.
- Queries, typically ran over tens of billions of rows and several terabytes of data, produce results in a matter of seconds.
- Calculating an optimal route is now done in short 1 millisecond, thus enabling the completion of simulations for several million of routes in less than 2 hours.
- Produce and visualizing various congestion indicators is user-friendly.
- The platform is scalable and will automatically adjust to new workloads.
- Last but not least, all that is achieved with a very reasonable cloud consumption budget.
Needless to say, these performance gains are impressive. But the real benefits of the data platform put in place are even easier to articulate. As a matter of fact, geospatial analysts can now tackle problems that were very complex, if not impossible, to solve just a few months ago. Analyzing traffic congestion is now business as usual at exo.
Now that’s a success!