Data Engineering - Guidelines

Data Engineering is a new profession, it combines knowledge, practices, capabilities, and professionalism coming from software engineering, Big Data, and Production needs.
Data engineers are typically responsible for building data pipelines to bring together information from different source systems. They integrate, consolidate and cleanse data and structure it for use by Data analytics and or  Data scientists or such as in aiOla as part of the product. They aim to make data easily accessible and to optimize their products or organizations’ big data ecosystem (https://www.techtarget.com/searchdatamanagement/definition/data-engineer)

On this page, we will list the guidelines for Data Engineering which are different from standard SW engineering.

See best practice by the legendary Guy Ernest : https://guyernest.medium.com/building-a-successful-modern-data-analytics-platform-in-the-cloud-4be1946b9cf5

I will detail and give an example of each principle to clarify and make sense.

  • Build Small, Fast & Iterate– solve specific data challenges – refactor if needed
  • Maintainability in mind– code changes by others
  • KISS – if it is complex – change it

beach, sea, netherlands-7582494.jpg

  • Learnability – learn and share knowledge
  • Data-aware – You can use data-specific interfaces (such as boto3)
  • Data immutability – be able to rerun data
  • Scale in mind – be aware of the needed scale
  • Buy not make – if someone developed it and it is production, seriously consider using it 
  • Surprise – don’t be surprised –  there will always be something unpredictable, the data, the tech… – be prepared to work hard to overcome it.  
  • Resilience – error does not break the code
  • Production-ready – any code can be prod
  • Errors & Monitoring – share your errors 

gull, beach, ornithology-7589545.jpg