Distributed hadoop mapreduce – course recommendation

Title: Design and develop a Distributed Recommendation System on Hadoop

Problem statement: 

 Given 2 CSV data sets: 

(a) A course dataset containing details of courses offered

(b) A job description dataset containing a list of job descriptions 

(Note: Each field of a job description record is demarcated by ” “)

You have to design and implement a distributed recommendation system using the data sets, which will recommend the best courses for up-skilling based on a given job description. You can use the data set to train the system and pick some job descriptions not in the training set to test. 

It is left up to you how you pick necessary features and build the training that creates matching courses for job profiles.

Use Map Reduce and Python

NOTE: Combine all data_job_posts.csv into a single CSV file. Due to size limitation, I had to split the file