Awesome open resoure new generational data laek house tools and project
There are often some excellent blogs and open source website learning resources that need to be recorded Also for better sharing and dissemination, so the total continues below.
Dataflow next gen data engineering tools
-
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
-
https://github.com/volcano-sh/volcano(About A Cloud Native Batch System (Project under CNCF))
-
https://github.com/apache/yunikorn-core( light-weight, universal resource scheduler for container orchestrator systems)
-
https://github.com/apache/airflow/(A platform to programmatically author, schedule, and monitor workflows)
-
https://github.com/sql-machine-learning/sqlflow(Brings SQL and AI together.)
-
https://github.com/open-metadata/OpenMetadata(About Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right)
-
https://github.com/data-observe/datav(Fully Customizable and programmable observability platform)
-
https://github.com/awslabs/data-on-eks(DoEKS is a tool to build, deploy and scale Data Platforms on Amazon EKS)