Problem Scenario 62 : You have been given below code snippet.
val a = sc.parallelize(List("dogM, "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1
Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String)] = Array((3,xdogx), (5,xtigerx), (4,xlionx), (3,xcatx), (7,xpantherx), (5,xeaglex))
Answer is in the explanation below.
Reference / correct answer:
See the explanation for Step by Step Solution and configuration.
Solution :
b.mapValuesf'x" + _ + "x").collect
mapValues [Pair] : Takes the values of a RDD that consists of two-component tuples, and applies the provided function to transform each value. Tlien,.it.forms newtwo-componend tuples using the key and the transformed value and stores them in a new RDD.
Question 8
SIMULATION7
SIMULATION
Problem Scenario 37 : ABCTECH.com has done survey on their roducts feedback using a web based form. With the following free text field as input in web ui.
Name: String
Subscription Date: String
Rating : String
And servey data has been saved in a file called spark9/feedback.txt
Christopher|Jan 11, 2015|5
Kapil|11 Jan, 2015|5
Thomas|6/17/2014|5
John|22-08-2013|5
Mithun|2013|5
Jitendra||5
Write a spark program using regular expression which will filter all the valid dates and save in two separate file (good record and bad record)
Answer is in the explanation below.
Reference / correct answer:
See the explanation for Step by Step Solution and configuration.
Solution :
Step 1 : Create a file first using Hue in hdfs.
Step 2 : Write all valid regular expressions sysntex for checking whether records are having valid dates or not.
val regl =......(\d+)\s(\w{3})(,)\s(\d{4}).......r//11 Jan, 2015
val reg2 =......(\d+)(U)(\d+)(U)(\d{4})......s II 6/17/2014
val reg3 =......(\d+)(-)(\d+)(-)(\d{4})""".r//22-08-2013
val reg4 =......(\w{3})\s(\d+)(,)\s(\d{4})......s II Jan 11, 2015
Step 3 : Load the file as an RDD.
val feedbackRDD = sc.textFile("spark9/feedback.txt"}
Step 4 : As data are pipe separated , hence split the same. val feedbackSplit = feedbackRDD.map(line => line.split('|'))
Step 5 : Now get the valid records as well as , bad records.
val validRecords = feedbackSplit.filter(x => (reg1.pattern.matcher(x(1).trim).matches|reg2.pattern.matcher(x(1).trim).matches|reg3.pattern.matcher(x(1).trim).matches | reg4.pattern.matcher(x(1).trim).matches))
val badRecords = feedbackSplit.filter(x => !(reg1.pattern.matcher(x(1).trim).matches|reg2.pattern.matcher(x(1).trim).matches|reg3.pattern.matcher(x(1).trim).matches | reg4.pattern.matcher(x(1).trim).matches))
Step 6 : Now convert each Array to Strings
val valid =vatidRecords.map(e => (e(0),e(1),e(2)))
val bad =badRecords.map(e => (e(0),e(1),e(2)))
Step 7 : Save the output as a Text file and output must be written in a single tile,
case class Product(productid: Integer, code: String, name: String, quantity:lnteger, price: Float, supplierid:lnteger)
case class Suplier(supplierid: Integer, name: String, phone: String)
case class PRDSUP(productid: Integer.supplierid: Integer)
// create an RDD of Product objects
val prdRDD = products.map(_.split('\")).map(p => Product(p(0).tolnt,p(1),p(2),p(3).tolnt,p(4).toFloat,p(5).toint))
val supRDD = supplier.map(_.split(",")).map(p => Suplier(p(0).tolnt,p(1),p(2)))
val prdsupRDD = prdsup.map(_.split(",")).map(p => PRDSUP(p(0).tolnt,p(1}.tolnt}}
prdRDD.first()
prdRDD.count()
supRDD.first() supRDD.count()
prdsupRDD.first() prdsupRDD.count(}
// change RDD of Product objects to a DataFrame
val prdDF = prdRDD.toDF()
val supDF = supRDD.toDF()
val prdsupDF = prdsupRDD.toDF()
// register the DataFrame as a temp table prdDF.registerTempTablef'products")
supDF.registerTempTablef'suppliers")
prdsupDF.registerTempTablef'productssuppliers"}
//Select product, its price , its supplier name where product price is less than 0.6
val results = sqlContext.sql(......SELECT products.name, price, suppliers.name as sup_name FROM products JOIN suppliers ON products.supplierlD= suppliers.supplierlD WHERE price < 0.6......]
results. show()
Limited Time Offer – Save Now!
Don’t miss out — get full access at the best price.