Design and develop high-throughput, low-latency data processing pipelines to quickly ingest and make data available on the platform across various distributed data stores
Analyze large datasets to identify opportunities to tune and improve the system
Experiment with various Hadoop frameworks like Hive, Pig and Scalding to identify the optimal approach for extracting valuable insights from massive datasets
Tool We Used:
Scala
Hadoop (Hive, Pig, Scalding, Spark)
Kafka
MySQL, Redis, Vertica, Aerospike
Qualifications
Requirements:
Bachelors or Masters in Computer Science or related field
3+ years of experience ingesting, processing, storing and querying large datasets
Professional Hadoop ecosystem experience, including storage optimization and job performance tuning
Expertise in Java, Python or similar language(s). Functional programming experience is a plus
Passion for code correctness and intuition about which values in data are to be expected in a business context
Additional Information
All your information will be kept confidential according to EEO guidelines.