Director – Hadoop Engineer


Responsible for designing and implementing big data engineering solutions for clients leveraging hands-on experience in data ingestion, transformation, and data flow design. Capable of working independently as well as with team members to drive customer satisfaction and successful consulting engagements. Collaborate with Marketing, Sales, Delivery, and Operations to achieve goals. Assist in guiding Big Data team to meet client requirements.


  • Provide a solution design that meets both the business requirements and the best Hadoop technical practices
  • Perform collection, cleansing, processing, and analysis of new and existing data sources, defining and reporting data quality and consistency metrics
  • Acquire Big Data Certifications if required
  • Participate in working sessions with technical executives and experts
  • Learn & stay current on Big Data techniques developments/improvements
  • Setting up Hadoop clusters.
  • Other duties involve backup, recovery and maintenance. Must have subject matter expertise and hands on delivery experience working on popular Hadoop distribution platforms like Cloudera, HortonWorks, and / or MapR.


Location and Travel

  • Remote or Randolph office and occasional travel to client worksites.

Qualifications and Skills


  • 3+ (preferable 5+) years in Big Data Engineering, including data export/import from RBDMS to HDFS, and real-time/near-real-time streaming data ingestion and transformation.
  • 5+ years experience with administering Linux production environment.
  • 3+ years experience managing full stack Hadoop distribution (preferably Cloudera). Including monitoring.
  • 3+ years experience with implementing and managing Hadoop related security in Linux environments (Kerberos, SSL).
  • 3+ years actively consulting or leading a consulting group/practice.



  • Bachelor’s Degree in Computer Sciences or a relevant technical field, advanced degree preferred.


  • Cloudera Certification preferred


  • Strong SQL and HiveQL skills (Java/MapReduce, Python are a plus)
  • Understanding of major RDBMS systems like Oracle, MySQL, PostgreSQL, SQL Server, DB2, & Sybase
  • Working knowledge of data compression and partitioning techniques, Avro/Parque formats, optimization tuning
  • Ability to debug, understanding Hadoop/YARN log files
  • Working knowledge of automating/scheduling data flows with Oozie (both utilizing GUI and scripting)
  • Working knowledge of Hadoop eco-system: YARN, HDFS, Sqoop, Hive/Impala, Oozie, Flume, Kafka, Solr
  • Proficient in Linux OS, bash scripting (AWK and/or SED is a plus)
  • A strong understanding of data profiling and data cleansing techniques
  • Solid understanding of ETL architectures, data movement technologies, and working knowledge of building data flows.
  • Proven tracking record on driving rapid prototyping and designs
  • Strong analytical and problem-solving skills with proven communication and consensus building abilities.
  • Proven skills to work effectively across internal functional areas in ambiguous situations.
  • Excellent organization and planning skills
  • High degree of professionalism
  • Ability to thrive in a fast-pace environment.
  • Microsoft Office; QlikView and Qlik Sense as plus

Employment Type

  • Full-time Employee



  • Competitive salary
  • Medical
  • 401K