Python Pyspark Puzzle 3

SAS

April 27, 2025

0

Using the provided population dataset, write a PySpark query to find and list all the countries where the female population is greater than the male population.

Input Data:

Input Data

Tap For Closer Look

Expected OutPut:

Input Data

Tap For Closer Look

Input DataFrame Script:

data = [
    ("India", 1430000000, 734000000, 696000000),
    ("China", 1410000000, 723000000, 687000000),
    ("United States", 340000000, 167000000, 173000000),
    ("Indonesia", 277000000, 140000000, 137000000),
    ("Pakistan", 240000000, 122000000, 118000000),
    ("Brazil", 216000000, 106000000, 110000000),
    ("Nigeria", 223000000, 112000000, 111000000),
    ("Bangladesh", 173000000, 86500000, 86500000),
    ("Russia", 144000000, 67000000, 77000000),
    ("Mexico", 130000000, 64000000, 66000000),
]
schema = ["Country","TotalPopulation","MalePopulation","FemalePopulation"]

fabricofdata_df = spark.createDataFrame(data,schema)
display(fabricofdata_df)

Please try to answer the question and submit in comments section, if you still not able to get it click below show button to see the answer.

Tags:

Python-PySpark Puzzles

Newer
Older

Post a Comment (0)

Share to other apps

Copy Post Link