Arrival on Time: Smart Devices for efficient Railways at Goldschmidt

From Serverless to Monolith
28.05.2021

Goldschmidt combines with its companies all the expertise which is needed for the construction, maintenance, inspection and monitoring of a railway network. This unique network of highly qualified experts is the answer to the many varied requirements of rail infrastructure projects. Special expertise, excellent technical equipment and highly qualified personnel allow Goldschmidt to comprehensively implement each task with a standard high quality level worldwide.

The Project

Goldschmidt is a pioneer in the maintenance, inspection and digitalization of railway infrastructure. The digital platform Data Acquisition for Rail Infrastructure (Dari®) enables Goldschmidt’s customers to seamlessly integrate Goldschmidt’s devices and railway maintenance data into a centralized location for analysis and further processing. This allows railway maintenance companies to simplify device management, as well as streamline maintenance job approval and invoicing workflows. For railway network providers, this data can be used for quality management and predictive maintenance, i.e. determine the condition of railways to estimate when maintenance should be performed.

The Challenge

Already when starting up, Goldschmidt and a partner established a production platform whose features stood out in comparison to competitors’ solutions. However, after some time, development speed started to slow down and production deployments became complicated.

The application stack consisted of an Angular frontend with a serverless backend of about 100 AWS Lambda functions as well as DynamoDB and MariaDB for persistence.

Let’s take a moment to briefly discuss benefits and drawbacks of a serverless architecture.

One benefit of serverless architecture is reduced operational costs because developers do not need to think about provisioning servers, and can focus on the implementation. This makes it easy to start and deploy code to production. With most cloud providers, you only pay for the compute time you consume, so you’re never paying for over-provisioned infrastructure. Also, developers (or teams) are allowed to choose a programming language of their choice for every function (if the cloud provider supports that language), which is a major benefit when working with large or even multiple teams on a project. Additionally, horizontal scaling is completely automatic, elastic, and managed by the provider. Other additional benefits are integrated logging and security functions (e.g. function-specific principals with least-privilege) that are provided by the platform.

A drawback of serverless architecture is that data model consistency can be hard to establish. In a backend application, different pieces of functionality might need to access the same entities and thus share code. While code sharing on AWS has improved with the introduction of Lambda-layers, it is still something that needs to be considered and managed. Additionally, serverless functions are still a quite new concept and implementation of cloud providers are changing and so are the frameworks around them. The Serverless Application Framework, while a very powerful tool, has proved in some cases to be immature and brittle. Especially, when it comes to embedding and integrating functions with surrounding infrastructure such as API gateways, S3 buckets, etc. Another drawback of serverless, as with any distributed system, is the increased complexity of observability. Monitoring, tracing, alerting and debugging are inherently more difficult to do compared with a monolithic approach. Additionally, serverless architecture testing can be quite difficult. While the complexity of unit testing does not increase, it does for integration testing. Unit tests should only test small parts of your code in isolation. Doing integration testing with serverless functions becomes hard because everything is distributed from the start, and the integration of functions relies on vendor specific services that need to be simulated.

The Solution

kreuzwerker’s approach to addressing the main issues of the Dari® platform was to re-architect the platform. In our opinion, the drawbacks of a serverless architecture compared to a monolithic approach outweighed the benefits in this case. Therefore, the path we took could be labeled as From Serverless to Monolith (usually you will find stories in the web telling the opposite story).

In the beginning we migrated the whole infrastructure to incorporate common best practices according to the Well Architected Framework in regards to security, account separation and backup strategies. Then, we created a new monolithic backend application, which was continuously evolved to handle the core business logic, such as data and user management, taking over functionality from serverless functions step by step. This allowed us to test logical blocks of functionality while also simplifying deployment by removing a set of deployables and persistence stores with each migration. Code sharing and integration testing are more “natural” and require less effort in a monolith than in a distributed serverless environment. We chose a hexagonal architecture for this application to have clean separation between domain, application, persistence and API layers. This eased replacing persistence providers without changing the domain logic and improved the testability of each individual layer. Using testcontainers we were able to quickly develop features and test their integration with external services such as databases and AWS services (localstack). In addition, the introduction of end-to-end testing using Cypress, increased reliability and helped to find defects before they could reach production.

CI/CD for all deployables and infrastructure as code is a kreuzwerker best practice, which we, naturally, established for the Dari® platform. When we took over, the work to consolidate the code repositories had only just started. So, as a first step we bundled over 100 code repositories into about 20 and created build jobs and deployment pipelines accordingly. This allowed us to deploy each change automatically and to do a full production deployment within 20 minutes.

Additionally, we established monitoring of technical and business key performance indicators by providing metrics and logs and creating informative dashboards, as well as alerts.

All this was done in parallel to feature development, such as the integration of new devices into the Dari® platform.

Conclusion

As Dari® is a specialized platform and has neither demanding scaling requirements nor multiple engineering teams working simultaneously, the benefits of a serverless architecture did not really kick in.

The introduction of fully automated CI/CD pipelines and migrating step-by-step from a serverless architecture to a monolith has proven to be a worthwhile investment. kreuzwerker was able to speed up the development cycles and improve the testing as well as production deployment time, while at the same time implementing new features and ending up with satisfied customers.

Today, Goldschmidt offers a comprehensive and worldwide range of products and services for the construction, maintenance, inspection and monitoring of railway networks. With the improvements that have been done for the Dari® platform, a solid foundation for the digitalization of these devices and services has been established. This will lead to better insights, improved products and more satisfied customers.

kreuzwerker’s technical expertise and AWS’ infrastructure and managed services will continue to support this vision going forward.