Data Engineering!
Although essential for the continuous optimization and validation of business models, data engineering is still a comparatively young and little standardized discipline. All the more reason it requires experienced, smart and open-minded expertise as well as support that, in addition to “design and build”, always practices “enablement and make yourself redundant” with zeal.
For us it is an absolute matter of course to always develop a comprehensive working understanding of what we are involved in. In order to be able to do this, we concentrate our efforts as a company in the field of “data” in three special project types - our service offerings (see below).
This does not mean that we are not also thoroughly prepared to offer all other forms of cooperation and competency, as long as it fits and is goal-oriented.
AWS First
kreuzwerker - that means “AWS only” for the engineering part of our company since its foundation in 2010. Our focus in data engineering is therefore also on AWS as the target platform.
Naturally, we have a deep knowledge of the AWS Ecosystem. And here “we” really mean: each and every kreuzwerker! Based on this, we have an untypical high market reputation and visibility for a company of around 120 employees.
Our concrete AWS Data Service Offerings
Migration of your data analytics environment (DWHs, pipelines, dashboards) to AWS - gladly all of it
Those who migrate to AWS have their reasons. Cost alone is rarely one of them, and if it is, are you possibly missing out on opportunities with this approach? It’s worth thinking about. Even fundamentally replacing the existing data analytics environment (e.g. Oracle, SQL Server), often makes sense in this context. In any case, the first essential building block is to develop a precise, individual and holistic migration strategy: from lift & shift to “modernization” to complete re-architecting; from “this fiscal year” to a 5-year plan. And always on the AWS best practices basis.
We prefer to view the resulting plan as a coordinated recipe for doing it ourselves - even as a work contract at a fixed price.
Data Analytics Platforms - Systematize your analytical handling of data
For us a data analytics platform is specifically a collection of services and features that allows your stakeholders to compose complex questions and queries against very large data volumes and to expect an answer–at the latest after your data engineers have made the necessary enhancements in an uncomplicated manner. The results are then combined, analyzed, examined and visualized as usual. Data analytics platforms usually combine several Big Data tools and take care of scaling, availability, security and performance behind the scenes. Not to mention blueprints, APIs and other “standard components” that make it easier to maintain the maintenance in a large, distributed context and avoid resource-hit-by-truck trouble.
A data analytics platform is data engineering 2.0 - and may follow a (cloud-migrated? 😃 DWH as the next evolutionary step.
- Conception consulting of a data platform
- Development of access and governance concepts
- Provision of self-service offers and templates for internal stakeholders
- Cloud coaching and enablement for internal data engineers and data analysts to use the platform
- Continuous development of the Data Platform (e.g. optimization of pipelines, connection of additional data sources)
We create and optimize data pipelines
If you are a service provider who can successfully migrate to AWS, as well as design and implement entire data analytics platforms, then you probably know how to build pipelines. However, they are a very special hobbyhorse of ours: their design is smart, load- and future-proof, and at the same time, target-oriented (expensive!) plus an intellectual challenge.
We create the necessary architecture for the data pipelines, design the ETL processes and target schema. We also consider the required cleansing and anonymization procedures to ensure DSGVO compliance. We always keep performance in mind and can implement critical real-time and batch requirements.
We are involved in visualization and provisioning to the various stakeholders, but gradually fade into the technical background. We then build everything else as it was.
And now briefly & encyclopedically
Technicians are bad at omitting and “advertising” reduction. That’s why we’ll just quickly mention a few of the technologies that we’re so good at that it’s appropriate for you to pay us for them.
I
We are particularly drawn to the following technologies, which you can ask any of our data engineers questions day and night:
- Python and the complete PyData stack (Pandas, NumPy, PySpark)
- Spark
- AWS EMR / AWS EMR Serverless
- AWS Glue
- AWS Lake Formation
- Palantir Foundry
- All common Infrastructure as Code tools such as AWS CDK (with TypeScript, Python, Java, Go) or Terraform
II
Ongoing and Ongoing - Other Technology Focuses:
- Airflow / Amazon Managed Workflows for Apache Airflow (MWAA).
- Scala
- Java
- R
- Elasticsearch
- (almost) every relational and non-relational database available with the corresponding query languages (AWS RDS, AWS Aurora, PostgreSQL, MySQL, Oracle Database, Microsoft SQL Server, MongoDB, AWS DynamoDB…)
- Presto
- AWS Athena
- Mara ETL
- Google Cloud Dataproc
- Delta Lake
- AWS Redshift / AWS Redshift Serverless
- Snowflake
- Metabase
- Google Cloud Big Query
- Tableau
- AWS Quicksight