65th ISI World Statistics Congress 2025

65th ISI World Statistics Congress 2025

Can we build a truly agnostic, cloud-native data platform?

Conference

65th ISI World Statistics Congress 2025

Format: IPS Abstract - WSC 2025

Keywords: cloud platform, opensource

Abstract

The promise of cloud-native technologies and open-source software has been to free organizations from the constraints of vendor lock-in, enabling them to build flexible, scalable data platforms that can operate across any cloud or on-premise environment. However, the reality is more complex. While open-source tools provide transparency and the ability to customize solutions, they are not entirely free from forms of lock-in.

Think twice before developping your own solution

A key principle in Onyxia’s development is to carefully evaluate whether to build custom features. This applies not just at the start, but at every stage of the project. Even if a user requests something specific, it's often wiser to wait for the ecosystem to evolve, avoiding unnecessary complexity and lock-in. Agility may be great, but if you're too quick to jump on every user request, you might find yourself locked into a maze of custom solutions. Sometimes, moving slower and letting the ecosystem catch up is the real agile move.

Opensource it as fast as possible

When you develop software internally, you're unintentionally creating a form of lock-in. Your developers will naturally focus on solving the specific needs of your organization at a given moment, but those needs are likely to evolve over time. By open-sourcing your software early, you expose it to broader use cases, more diverse contributions, and a wider ecosystem of modern development practices. This ensures your project remains relevant, scalable, and future-proof, without being tethered to the constraints of internal-only development.

Prioritize standards and maintain transparency with your users

Onyxia helps users deploy a variety of useful services like Jupyter, RStudio, VSCode, and databases on Kubernetes clusters, using Helm as the standard tool. While Kubernetes Operators were an option, they are more invasive on the cluster. Helm, being user-side, ensures a simpler, less intrusive solution, aligning with Onyxia’s focus on transparency. Each action performed through Onyxia’s interface is shown in an emulated terminal, giving users full visibility into the processes. As a result, an organization could easily stop using Onyxia if needed, as everything remains transparent and lightweight, without causing lock-in.

Accept your remaining lock-in

Lock-in is a bit like security risks—you can't eliminate it entirely, but you can learn to live with it. Just like you approve services once you've accepted the remaining security risks, you can "homologate" your software once you've made peace with the fact that some lock-in is inevitable. Onyxia, for example, depends on containerization and object storage. At Insee, we strongly believe in this architecture and fully accept the lock-in it brings. However, it's not the purpose of this presentation to dive into why we made those choices. The key is knowing when to stop aiming for perfect flexibility and accept that some degree of lock-in is part of building robust systems. By committing to these lock-ins, Onyxia becomes a reusable solution for many organizations, including others NSI.