Data Scientist Salary?

Graduating is a few months away. It is important to me to know what awaits me. Data Scientist job listings taken from Glassdoor constitute the dataset I chose.

Objectives

In what kinds of jobs do salaries tend to be higher? (Job Title, Salary Estimate)
Do salary levels differ according to a job’s location?
Which kinds of companies will pay a higher salary? (Company, Size, Industry, Revenue)

The limit and assumption is:

Salary estimates are taken from Glassdoor and are not necessarily indicative of actual salaries.
During the time the dataset was published, in July 2020, it only included the outcome for that time period.

Hypothesis:

If people lost their jobs in the middle of the pandemic, there are job openings.

I started with importing my data from a local file, uploaded the libraries, and clean the dataset. I removed unnecessary columns, by using .drop(). Luckily there weren’t any missing values.

#Remove Rating values from Company Name. 
ds['Company Name'],_=ds['Company Name'].str.split('\n', 1).str
# 1st column after split, 2nd column after split (delete when '_')
# string.split(separator, maxsplit) maxsplit default -1, which means all occurrances

# Split salary into two columns min salary and max salary.
ds['Salary Estimate'],_=ds['Salary Estimate'].str.split('(', 1).str


#exclude hourly rating salaries
ds=ds[(ds['Salary Estimate'].str.contains(' Per Hour'))==False].reset_index(drop=True)

# Split salary into two columns min salary and max salary.
# lstrip is for removing leading characters; rstrip is for removing rear characters
ds['Min_Salary'],ds['Max_Salary']=ds['Salary Estimate'].str.split('-').str
ds['Min_Salary']=ds['Min_Salary'].str.strip(' ').str.lstrip('$').str.rstrip('K').fillna(0).astype('int')
ds['Max_Salary']=ds['Max_Salary'].str.strip(' ').str.lstrip('$').str.rstrip('K').fillna(0).astype('int')


# To estimate the salary with for analysis, we will look at one number: Est_Salary = (Min_Salary+Max_Salary)/2
ds['Est_Salary']=(ds['Min_Salary']+ds['Max_Salary'])/2

# To estimate the size for analysis, we will look at one number: Est_Salary = (Min_Salary+Max_Salary)/2
#ds['Est_Size']=(ds['Min_Size']+ds['Max_Size'])/2

# Separate 'City' & 'State' from job 'Location'
ds['City'],ds['State'] = ds['Location'].str.split(', ',1).str


# Clean up duplicated city names in State's name
ds['State']=ds['State'].replace('Arapahoe, CO','CO')
ds['State']=ds['State'].replace('Los Angeles, CA','CA')
ds['State']=ds['State'].replace('NY (US), NY','NY')

I rechecked for missing values.

Job Title               0
Salary Estimate         0
Rating                405
Company Name            0
Location                0
Headquarters          240
Size                  229
Founded               970
Type of ownership     229
Industry              543
Sector                543
Revenue               229
Competitors          2743
Easy Apply           3725
Min_Salary              0
Max_Salary              0
Est_Salary              0
City                    0
State                   0
dtype: int64

As you can see in the above figure, there are a lot of missing values. ‘Easy Apply’ and ‘Competitors’ have the highest number of missing values (> 50%)

Most companies have null values in Easy Apply because they are not hiring at the moment.

I looked at a few aspects to see my possibilities for a good job by using the bar chart and explanatory visualizations, for example, the current openings, Top 20 cities with their minimum and maximum salaries and Size of Employees Vs No of Companies, etc.

This is a bar graph about the Current Openings

Current openings

It is a uniform distribution for the top 10 companies. The following are the top 10 companies hiring for Data Analyst roles in 2020.

This is a minimum and maximun Salary chart

salary

The minimum salary is unimodal, the maximum salary is non symmetric; bimodal.

From this, we learn that we have a > chance of reciving a salary in the max salary range.

This is a bar graph of the job title with the most job openings and currenty hiring

Job openings

This is not skewed to the right as the job titles have no correlation.

In conclusion, the bar graph indicates that the most openings are in the Data Science field, so as a new hire looking to be hired I’ll start with that field.

Data Scientist Salary?

Looking for a job?

Data Scientist Salary?

Looking for a job?

Objectives

Hypothesis:

This is a bar graph about the Current Openings

This is a minimum and maximun Salary chart

This is a bar graph of the job title with the most job openings and currenty hiring