05-project-socialnetwork-project-socialnetwork

Problem 1 (Sqoop - Hive)

Creating Mysql database,tables and loading the csv data into tables

1, create database;

      create database SocialNetwork;

2,Create user table

     create table USER (email VARCHAR(255) NOT NULL PRIMARY KEY,fullname VARCHAR(255) NOT NULL,age INT NOT NULL,followers_count INT NOT NULL);

3,create post table

     create table POST (post_id INT NOT NULL PRIMARY KEY,category VARCHAR(255) NOT NULL,date DATE NOT NULL,user_email VARCHAR(255) NOT NULL,FOREIGN KEY
     (user_email) REFERENCES USER(email));

4,create likes table

   create table LIKES (user_email VARCHAR(255) NOT NULL,post_id INT NOT NULL,PRIMARY KEY (user_email, post_id),FOREIGN KEY (user_email) REFERENCES USER(email)
   FOREIGN KEY (post_id) REFERENCES POST(post_id));

5, load users.csv data into user table

   load data local infile 'users.csv' into table USER fields terminated by ',';

6,load posts.csv into post table

  load data local infile 'posts.csv' into table POST fields terminated by ',';

7,load likes.csv into likes table

  load data local infile 'likes.csv' into table LIKES fields terminated by ',';

8,Import all tables into warhouse directory using sqoop

  sqoop-import-all-tables --connect jdbc:mysql://master/SocialNetwork --username gebre -P --warehouse-dir /home/gebre/SocialNetwork -m 1

Problem 2 (HBase - Map Reduce)

Task 1 : Design a MapReduce job that joins post.csv and likes.csv into an HBase table (posts)

To solve this problem we used two maper classes and one reducer class :

1.PostsMapper which is responsible for parsing posts.csv file and emit into hdfs file system using post id as key and category, date,useremail as values and here we also included a tag named post at the begining of the values string which will further be used as an identifer to Post or likes in the reducer side.

2.LikesMapper which is responsible for parsing likes.csv file and emit into hdfs file system using post id as key and useremail(a user who likes that post id) as value and here we also included a tag named Likes at the begining of the values string which will further be used as an identifer to post or likes in the reducer side. The image attached in the below shows the output of the two mappers classes(i.e tha data which is ready to be sent to the reducer)

The mappers will finally send list of PostId keys and list of users who liked the postid ,date the post is posted , the post category and the user who posts the post to reducer class.
The PostLikesReducer then receives the data from the mappers and make some parsing to collect all the users who liked a particular postid into a json array object,uses postid_date as row key ,info as column family 1 with category and user email as column qualifier,likes column family with users qualifier (stores list of users who liked the post id in json array format),numberoflikes qualifier we created it by ourselfs to store the number of users who liked the post becausse this column will help us in the second task and finally this reducer stores this all info into an hbase table named posts.

Task 2 : Design a MapReduce job that reads from the previous HBase table and computes:

1.Count of posts per category posted by a user

2.Average number of likes obtained by posts per category of each users

Solution:

To solve this problem we used one maper,one combiner and one reducer class where :

The maper class named PostCountAndAvgLikesMapper reads data from Hbase table and extracts the post category,user email who made post and number of likes. Then Constructed a composite key using the user ID and category and we emit the composite key as a key and the number of likes as the value and send this data to the combiner to perform map side aggregation. The result of this maper looks like as in the below image.
The combiner classe named PostCountAndAvgLikesCombiner takes the key value paires obtained from the previous maper and groups all post categories which have the same composite key and summs the number of likes obtained for each category and produces a new key value pair that has count of all posts of the same category per user and sum of likes for each post category and the same composite key too. The result of the combiner class looks like in the below image.

The Reducer class named PostCountAndAvgLikesReducer the receives the aggragated count of posts and total number of likes obtained per every user post and category and then computes the average number of likes obtained using
Average = total likes obtained per post category of user/count of post category. Then it will finally emits, the Composite key as a key , count of posts ,number of likes ,Average likes obtained as values. The output of the reducer will look like as in the below image.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Social-Network		Social-Network
spark-part		spark-part
05-project.pdf		05-project.pdf
Project 05 - Big Data Social Network Project Report.pdf		Project 05 - Big Data Social Network Project Report.pdf
README.md		README.md
likes.csv		likes.csv
posts.csv		posts.csv
users.csv		users.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

05-project-socialnetwork-project-socialnetwork

Problem 1 (Sqoop - Hive)

Creating Mysql database,tables and loading the csv data into tables

1, create database;

2,Create user table

3,create post table

4,create likes table

5, load users.csv data into user table

6,load posts.csv into post table

7,load likes.csv into likes table

8,Import all tables into warhouse directory using sqoop

Problem 2 (HBase - Map Reduce)

Task 1 : Design a MapReduce job that joins post.csv and likes.csv into an HBase table (posts)

Task 2 : Design a MapReduce job that reads from the previous HBase table and computes:

1.Count of posts per category posted by a user

2.Average number of likes obtained by posts per category of each users

Solution:

Problem 3 (Spark)

About

Releases

Packages

Languages

Gebreyowhans/BigData2022-23

Folders and files

Latest commit

History

Repository files navigation

05-project-socialnetwork-project-socialnetwork

Problem 1 (Sqoop - Hive)

Creating Mysql database,tables and loading the csv data into tables

1, create database;

2,Create user table

3,create post table

4,create likes table

5, load users.csv data into user table

6,load posts.csv into post table

7,load likes.csv into likes table

8,Import all tables into warhouse directory using sqoop

Problem 2 (HBase - Map Reduce)

Task 1 : Design a MapReduce job that joins post.csv and likes.csv into an HBase table (posts)

Task 2 : Design a MapReduce job that reads from the previous HBase table and computes:

1.Count of posts per category posted by a user

2.Average number of likes obtained by posts per category of each users

Solution:

Problem 3 (Spark)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages