Using the provided population dataset, write a PySpark query to find and list all the countries where the female population is greater than the male population.
Input Data:
Expected OutPut:
Input DataFrame Script:
data = [
("India", 1430000000, 734000000, 696000000),
("China", 1410000000, 723000000, 687000000),
("United States", 340000000, 167000000, 173000000),
("Indonesia", 277000000, 140000000, 137000000),
("Pakistan", 240000000, 122000000, 118000000),
("Brazil", 216000000, 106000000, 110000000),
("Nigeria", 223000000, 112000000, 111000000),
("Bangladesh", 173000000, 86500000, 86500000),
("Russia", 144000000, 67000000, 77000000),
("Mexico", 130000000, 64000000, 66000000),
]
schema = ["Country","TotalPopulation","MalePopulation","FemalePopulation"]
fabricofdata_df = spark.createDataFrame(data,schema)
display(fabricofdata_df)
Please try to answer the question and submit in comments section, if you still not able to get it click below show button to see the answer.