STEMtech has a client in the Atlanta area looking for a Data Engineer who will aid in the optimization of operations by manipulating and aggregating the disparate operational and back office data sources into a format that is easily digestible by both data scientists and statistically adept colleagues. His/her core responsibility will be to combine large volumes of disparate complex data, conduct quality checks on the data, manipulate the data and ensure continuous access to a clean format of the operational data for data scientists and other stakeholders. In addition, he/she will also assist in developing the data pipeline to ensure ongoing data collection, consolidation, and management
· Create data ingestion pipeline and processes based on jointly defined requirements
· Profile and analyze data to identify gaps and potential data quality issues; works with business SME’s to resolve these issues
· Identifies relationships between disparate data sources
· Uses Python, “R”, Informatica, and other Big Data tools and technologies to code the data Engineering routines
· Designs and develops the Data Engineering routines for feature extraction, feature generation and feature engineering
· Works with the group of data scientists and business SMEs to get the requirements and present the details in data
· Designs and jointly develops the data architecture with data architect and ensures security and maintenance
· Explores suitable options, designs, and creates data pipeline (data lake / data warehouses) for specific analytical solutions
· Identifies gaps and implements solutions for data security, quality and automation of processes
· Builds data tools and products for effort automation and easy data accessibility
· Supports maintenance, bug fixing and performance analysis along data pipeline
· Diagnoses existing architecture and data maturity and identifies gaps
· Gather requirements, assess gaps and build roadmaps and architectures to help the analytics driven organization achieve its goals
· 8-10 years of experience in data engineering and Data Lake using any Hadoop ecosystem
· Bachelor’s Degree in Computer Science, Engineering, and/or background in Mathematic and Statistics; Master’s or other advanced degree a plus
· Previous leadership experience
· Experience on Big Data platforms (e.g. Hadoop, Map/Reduce, Spark, HBase, HDInsight, Data Bricks, Hive) and with programming languages like UNIX shell scripting, Python etc.
· Has used SQL, PL/SQL or T-SQL with RDBMSs like Teradata, MS SQL Server, Oracle etc in production environments
· Experience with reporting and BI packages e.g. PowerBI, Tableau, SAP BO etc.
· Strong critical thinking and problem solving skills
· Success at working on cross-functional teams to meet a common goal
· Self-starter with a high sense of urgency
All your information will be kept confidential according to EEO guidelines.
This job is closed.