The database contains salary data for the years 2009, 2011, 2018, and 2022.
Whereas in the Excel file we only see data for 2011.
A database must have an ID, which is called Primary Key. We will talk more about this in class.
The excel file does not have such ID.
Finally, USA Today -which is the source of the data- has not been consistent in the formatting of the data across the years:
Starting in 2018, they swapped around the order of the names of the players:
From First_name Last_name to Last_name, First_Name
In 2022, they used an abbreviation for the positions of each player.
Be creative with your queries, based on what you learned in DataCamp, to account for these differences in the input data.
You need to uncover interesting findings in the data.
What is interesting? You name it. Some inspiration:
What is the % difference between the average salary increase for a pitcher between 2018 and 2011?
Which are the team(s) that have seen decreases in their total salary budgets across the years?
Which are players which have been around since 2009 with the same position?
Which players changed teams between 2018 and 2022?
. and more. The world is your oyster.
the topic is writing queries in SQL





Recent Comments