Revolutionizing Data Orchestration: An In-Depth Look at Kestra
Written on
Chapter 1: The Importance of Data Orchestration
Data orchestration is arguably one of the most crucial aspects of data management. It addresses the fundamental question: "How can we efficiently transfer data from one location to another while ensuring timely delivery and observability?" Although the methods for processing data are important, orchestrating the final assets is essential for effective data management.
Despite the availability of numerous tools for data processing, the orchestration of data remains a challenge. We are witnessing gradual advancements in both tools and methodologies, but significant progress is still needed.
Section 1.1: A Paradigm Shift in Data Orchestration
For many years, various data orchestration frameworks have existed, both new and established. These frameworks are effective, covering numerous integrations and are utilized by leading companies. Writing in Python, while accessible due to its extensive libraries, remains a skill primarily held by data engineers, data scientists, and software developers.
Even though Python is generally straightforward, it can quickly become a tangled web of complexity, leading to a legacy codebase that's hard to manage. As the data industry matures, there is still a noticeable gap in the developer experience, particularly regarding orchestration tools that can scale and support all data practitioners.
The data domain often lags behind software and operations, borrowing valuable practices (like tests and CI/CD) and organizational structures (for instance, data mesh, which applies service-oriented architecture principles to data organization). Just as ReactJS has supplanted jQuery in frontend development and Terraform has taken over custom scripts in infrastructure, the data sector is naturally transitioning towards a declarative paradigm.
We already have tools like dbt and SQLMesh for ELT data transformation, and now we can add Kestra to the mix for declarative data orchestration.
Section 1.2: Introducing Kestra: A Unique Approach
Kestra may seem like just another data orchestrator, but it offers a distinct approach. This open-source tool simplifies the creation of complex workflows using a declarative language, specifically YAML. By using a domain-specific language (DSL) for data orchestration, Kestra broadens access to a wider audience. Users do not need to possess advanced coding skills to build workflows.
While primarily targeting data-centric professionals, Kestra also caters to those who may not be fully technical. It accelerates the development process for data practitioners who prefer not to manage a complex Python codebase.
One of Kestra's standout features is its flexibility. Users can incorporate Python, utilize dbt, integrate Airbyte, and even connect with Google Spreadsheet. Additionally, if data requirements extend beyond the standard data stack, Kestra accommodates unique needs with capabilities like Docker support, custom plugins, and webhooks.
Moreover, Kestra allows users to construct workflows directly in the UI, facilitating close proximity to the documentation and visualizing workflows effectively. This significantly reduces the context-switching that often plagues orchestration tasks.
Chapter 2: Scalability and Future Prospects
Kestra is designed for scalability, supporting both horizontal and vertical growth without fail. While not everyone prioritizes scalability, neglecting it when building a data infrastructure can lead to serious pitfalls down the line.
Looking ahead, Kestra presents itself as an innovative tool with a broader scope and impressive features. If you find your current orchestrator lacking, or if you face challenges with legacy systems and cumbersome abstractions, Kestra is worth considering.
As the Product Owner and Data Engineer at Kestra, I recognize the potential bias in this perspective. While I've enjoyed working with Dagster and experimenting with Prefect, they often feel like mere iterations of Airflow, resulting in complex, heavily adorned Python code that can become unmanageable.
Moreover, many stakeholders outside the data engineering team—like data analysts, business developers, and product managers—lack the time or expertise to navigate the intricacies of Python code and its dependencies. As Grudin's law suggests, if those who benefit from technology are not the ones doing the work, the technology is likely to falter.
This is a significant issue within the data landscape. Ultimately, the benefits of data should extend beyond data engineers to the broader business context. While data engineers will continue to play a vital role, it’s essential to democratize the data craft, enabling more individuals to engage with it.
The Kestra team is committed to addressing these challenges and enhancing the platform to ensure its growth and usability. With a strong feature set, a dedicated team, and a forward-thinking vision, Kestra could become an essential asset for data orchestration.
For a deeper technical understanding of Kestra, I recommend reading the introduction by my colleague Loic or exploring our comprehensive documentation. Don't hesitate to check out the live demo on the Kestra website, and feel free to reach out to me with any questions or feedback!
As someone with a background in diverse industries such as journalism, retail, professional sports, and music, I am passionate about data. If you're seeking valuable resources and insights, consider subscribing to my newsletter—👀 From An Engineer's Sight.