K
karakanb
Every data pipeline job I had to tackle required quite a few components to set up:
In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.
Bruin supports quite a few stuff:
It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.
Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.
We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.
We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.
GitHub - bruin-data/bruin: Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!
Best, Burak
Comments URL: Show HN: I built an open-source data pipeline tool in Go | Hacker News
Points: 136
# Comments: 28
Continue reading...
- One tool to ingest data
- Another one to transform it
- If you wanted to run Python, set up an orchestrator
- If you need to check the data, a data quality tool
In its core, Bruin is a data framework that consists of a CLI application written in Golang, and a VS Code extension that supports it with a local UI.
Bruin supports quite a few stuff:
- Data ingestion using ingestr (GitHub - bruin-data/ingestr: ingestr is a CLI tool to copy data between any databases with a single command seamlessly.)
- Data transformation in SQL & Python, similar to dbt
- Python env management using uv
- Built-in data quality checks
- Secrets management
- Query validation & SQL parsing
- Built-in templates for common scenarios, e.g. Shopify, Notion, Gorgias, BigQuery, etc
It includes an open-source VS Code extension as well, which allows working with the data pipelines locally, in a more visual way. The resulting changes are all in code, which means everything is version-controlled regardless, it just adds a nice layer.
Bruin can run SQL, Python, and data ingestion workflows, as well as quality checks. For Python stuff, we use the awesome (and it really is awesome!) uv under the hood, install dependencies in an isolated environment, and install and manage the Python versions locally, all in a cross-platform way. Then in order to manage data uploads to the data warehouse, it uses dlt under the hood to upload the data to the destination. It also uses Arrow’s memory-mapped files to easily access the data between the processes before uploading them to the destination.
We went with Golang because of its speed and strong concurrency primitives, but more importantly, I knew Go better than the other languages available to me and I enjoy writing Go, so there’s also that.
We had a small pool of beta testers for quite some time and I am really excited to launch Bruin CLI to the rest of the world and get feedback from you all. I know it is not often to build data tooling in Go but I believe we found ourselves in a nice spot in terms of features, speed, and stability.
GitHub - bruin-data/bruin: Build data pipelines with SQL and Python, ingest data from different sources, add quality checks, and build end-to-end flows.
I’d love to hear your feedback and learn more about how we can make data pipelines easier and better to work with, looking forward to your thoughts!
Best, Burak
Comments URL: Show HN: I built an open-source data pipeline tool in Go | Hacker News
Points: 136
# Comments: 28
Continue reading...