by Mridu Bhatnagar
Today, we are moving towards machine learning. Making predictions, finding out insights based on data. For the same purpose, the initial step is to have efficient processes in place which help us in collecting data from various different data sources. Using traditional ways to collect data is tedious and cumbersome. Manually running scripts to extract, transform and load data is a trade-off with time. To make the process efficient. The data pipeline can be automated. Scripts to extract data can be auto-scheduled using crontab. However, using crontab has its own drawbacks. One major challenge comes in monitoring. This is where an open-source tool built by Airbnb engineering team, Apache airflow, can help. Airflow is a platform to programmatically author, schedule and monitor workflows.
About the Author
Talk Details
Date: Sunday Nov. 17
Location: Sky Room
Begin time: 12:45
Duration: 25 minutes