In my previous role as a consultant Machine Learning Engineer, I worked on a few different projects. Some of them were in-house; improving our internal workflows by connecting certain components end-to-end or creating a framework to standardize certain parts of our data science projects. Others were large scale client projects; building training and inference pipelines for machine learning models, putting them into production or creating an API to serve model predictions to outside consumers.
For our client projects, I was a part of various teams. Working with uniquely talented data engineers, data scientists, domain experts, client side stakeholders and product managers, consultants from fellow consultancy firms and technical people of our 3rd party data providers was a huge bliss and a huge challenge at the same time. I gained valuable experiences and a good understanding of the cloud platforms and tools for productionizing and scaling data-intensive applications.
The cloud platform tools I got familiar with throughout these experiences were AWS EC2, SQS, ECR, Batch, Athena, Glue, CodeBuild, CodeDeploy, CloudFormation, CloudWatch, Docker and PySpark.
In the following posts, I’ll briefly share some of the applications I built. For confidentiality reasons, I surely will not enclose the details of them. However, I can still outline the scope from an engineering perspective and I hope they’ll convey a rough idea on what types of structures get built around a machine learning model to get it on production or how different cloud components can be tied together to produce automated workflows.
So, enjoy!