Assignment Guideline
This assignment consists of a series of questions that require individual answers. Carefully follow the instructions provided below to ensure successful completion:
• Read the instructions thoroughly before proceeding to answer the questions.
• Respond to each question separately and in the same order as they appear in this instruction.
• Number the questions clearly.
• The submission format should be a single “Markdown” file containing both your code and appropriate documentation for each response.
• Present each question in your Markdown file followed by the corresponding code for your answer (check the screenshot below).
• Enhance clarity by using proper documentation and comments in your code.
• Submit a single file that encompasses all the questions.
• QuehThe attached screenshot is a sample of a proper markdown file. Please ensure your submission adheres to the following format:
Answer the following Questions using the nycflights13 dataset that was used in “week 2” lecture note. This dataset encompasses details of over 330,000 flights departed from the primary New York airports—JFK, LGA, and EWR—to 105 distinct destinations during the year 2013. The dataset offers 19 different attributes about the flights which are briefly explained below. For more information, look up the dataset’s documentation.
• year, month, day: Date of Departure
• dep_time, arr_time: actual departure and arrival times. They are written in the format of HHMM. For example, 554 means 5:54 AM, and 1354 means 1:54 PM.
• sched_dep_time, sched_arr_time: scheduled departure and arrival times. They are written in the format of HHMM.
• dep_delay, arr_delay: departure and arrival delays, in minutes. Negative time means early departures/arrivals.
• carrier: Two letter carrier abbreviations. It represents the name of the airline.
• flight: Flight Number.
• talinum: plain tail number.
• origin, dest: Origin and destination of the flights. The origin is one of the three airports in New York. The destination can be any of the 105 listed airports in the dataset.
• air_time: amount of time spent in the air, in minutes.
• distance: distance between the destination and origin airports, in miles.
Questions
Question 1.1. Which 3 carriers have the highest average departure delay? (Calculate the average departure delay of all carriers and print the name of the top three along with their corresponding average delay value).
Question 1.2. What is the average departure delay of each of these three carriers in every month? (Calculate the average departure delay of the top three carriers (from Q1.1) per month. Store the results of each carrier in a separate vector. Create a new table by putting these three vectors side-by-side (use mutate) and print the table).
Question 1.3. Which three months have the highest average departure delay? (Calculate the average departure delay for each month and print the top three months with their corresponding average departure value.)
Question 1.4. Which carriers (Airlines) offer flights from all three airports (JFK, LGA, EWR) (Find the list of the unique carriers in each airport (using the unique () function). Then find the common names of the carriers which appear in the list of all three airports).
Question 1.5. What are the top three destinations with the lowest average departure delay? (Calculate the average delay based on the destinations and print the top three with lowest delay along with their corresponding average delay values).
Question 1.6. Suppose you want to fly from New York to Tamp Florida (TPA). Which airport is expected to have the lowest average delay? Which carrier has the lowest delay? Finally, if you were to select a carrier from one of the three airports with the lowest average flight delay for trips to Tampa, which carrier and airport would you choose? (The “Dest” code for Tampa is “TPA”. Filter your dataset based on the destination and keep those with TPA as the destination. Calculate the average delay from each airport to Tampa. For the second question, calculate the average delay from New York (irrespective of the airport) to Tampa with respect to the carriers. For the final question, you need to first filter your data based on the destination and keep TPA. Then group the remaining dataset with respect to carrier and origin. Pick the carrier and the airport with the lowest average delay.)
Question 1.7. Assuming a penalty of $1000 for each delay exceeding 30 minutes, how much penalty has each of the three airports paid due to flight delays per month? Which airline has caused the highest penalty to JFK? (Filter your dataset to keep only those rows in which the delay is more than 30 minutes. Group your dataset by month and airport. Count the number of such delays per airport per month and multiply the results by 1000).
Question 1.8. Imagine an investigator needs a list of all flights departed from JFK between 5:30 AM and 11:00 AM from March 11 to March 21. Generate a list that contains these flights. Furthermore, identify the destination to which the highest number of flights were destined. (Filter your dataset based on the specified date and time. Filter the new dataset to keep JFK as the origin. Finally count the number of flights to different destinations and keep the one with highest number of flights.)
Question 1.9. Use bar charts to plot the distribution of the distance between each airport (origin) and all destinations. (You should have three separate bar charts, one for each origin (airport). You can generate three separate datasets, one for each airport). Is there a meaningful difference in the patterns of these three bar charts? Why?
Question 1.10. Consider a scenario in which an airline is obligated to pay a tax of $1 for each mile of flight. Calculate the total tax that United Airlines (UA) has paid per month. Additionally, plot the tax amount for each destination for UA. (To calculate the tax per flight for UA, filter the dataset and keep the records belonging to UA. Then create a new column for the Tax amount per flight using the distance columns and the tax rate. Then calculate the average tax per month. For the visualization purpose, use a scatterplot in which the y-axis is the tax amount per flight (as calculated above), and on the X axis is the destinations.)