- Design, develop, test, implement, and maintain code, information architecture, and conceptual models to support data processing, and flows thru data lake
- Landing or Source Zone – data ingestion of raw data or capture of streaming data
- Reporting Zone – transform into data model for external consumption by reporting & self-service BI
- Sandbox Zone – evaluate data quality, transform raw data, and cleanse data
- Builds large-scale data processing systems, is an expert in data warehousing solutions and should be able to work with the latest (NoSQL) database technologies.
- Embrace the challenge of dealing with petabytes of data on a daily basis.
- Understands how to apply technologies to solve big data problems and to develop innovative big data solutions.
- Have extensive knowledge building data processing systems with Hadoop and Hive using programming or scripting languages
- Also expert knowledge should be present regarding different (NoSQL or RDBMS) databases such as.
- Works on implementing complex big data projects with a focus on collecting, parsing, managing, analyzing and visualizing large sets of data to turn information into insights using multiple platforms.
- Recommend, design, implement and maintain the various file formats (e.g. XML/XSD, SequenceFiles, Avro files, or Parquet files) for information interchange between application, external systems, 3rd party applications and/or data lake.
- Review and evaluate database performance, risk and financial analysis feasibility studies
- Investigate and repair application defects regardless of component, including platform, business logic, data process logic, or database (SQL and data modeling).
- Ability to develop prototypes and proof of concepts for the selected solutions
- Develop data and metadata policies and procedures
- Implement and maintain operational and disaster-recovery procedures.
- Participate in the review of code and/or systems for proper design standards, content and functionality.
- Participate in all aspects of the Systems Development Life Cycle
- Analyze files and map data from one system to another
- Adhere to established source control versioning policies and procedures
- Meet timeliness and accuracy goals.
- Communicate status of work assignments to stakeholders and management.
- Responsible for technical and production support documentation in accordance with department standards and industry best practices.
- Maintain current knowledge on new developments in technology-related industries
- Participate in corporate quality and data governance programs
QUALIFICATIONS & EXPERIENCE
- 5+ years of systems/application analysis & design experience
- 3+ years of data modeling & database administrator experience
- 3+ years of experience in designing, building, and using a big data distribution, preferably MapR (Hortonworks, or Cloudera), for ◦data ingestion, cleansing, and transformation (e.g. Talend, Scoop)
- data discovery & analysis using querying tools (e.g. Impala, Hive)
- data storage using distributed databases (HBASE, Kudu)
- data streaming (e.g. Kafka, Apache Spark)
- data visualization (e.g. Tableau, Qlik, Lumira)
- processing monitoring (e.g. MapR manager, Hue)
- Bachelor’s Degree in Information Technology or related field preferred
Note: Qualified candidates will be contacted within 2 business days of application. If an applicant does not meet the above criteria, we will keep your resume on file for future opportunities and may contact you for further discussion.