Posted: August 26th, 2021
HIGHER COLLEGES OF TECHNOLOGY
Computer and Information Science
Non-Exam Based Assessment
Cover Sheet
Course Name | Statistics and Probability | Course Code | CIS 2003 |
Assessment | Statistics Project | Handing out Week | Week 9 |
To be submitted in: | Week 14 | ||
Maximum Marks | 85 Marks | Percentage of Final Grade. | 25% |
This assessment will assess the following Course learning outcomes:
CLO1 | CLO2 | CLO3 | CLO4 | CLO5 | ||
Question No. | √ | √ | √ | √ | √ | |
The entire project/case study/poster is designed and developed by me (and my team members). The proper citation has been used when I (and my team members) used other sources. No part of this project has been designed, developed, or written for me (and my team members) by a third party. I have a copy of this project in case the submitted copy is lost or damaged. None of the music/graphics/animation/video/images used in this project have violated the Copy Right/Patent/Intellectual Property rights of an individual, company, or Institution. Student Signature: Date: Student Signature: Date: | ||||||
For Examiner’s Use Only
Question No. | A Project Report | B 50% Weighted marks of Project Report | C Oral/viva | D 50% weighted scores of Oral/viva | Total Marks=B +D |
Marks Allocated | 70 | 50% | 15 | 50% | |
Student 1 | |||||
Student 2 | |||||
Student 3 |
Table of Contents
This assessment will assess the following Course learning outcomes: 1
d. Present Population data graphically using box plot and histogram. 4
B. Population mean at a Confidence level 7
C. Repeat (Part II a) for samples of sizes 31 and 45. 8
D. Widths of the above confidence level. 9
e. Check if all of your confidence intervals contain the μ. 9
B. Find the null and alternative hypotheses (H0 and H1) 10
Use a z test to make a conclusion. 10
Project Objectives:
Project Description:
Section I: Descriptive Statistics- Population Parameters and Sample Statistics (23Marks)
Descriptive statistics refers to a brief descriptive coefficient that summarizes a data set that can be the sample population or the entire population. There are two broad categories of descriptive statistics that is the central tendency and spread or variability measures. The primary tendency measures indicate the central location of a data set and include the median, mean, and mode. On the other hand, the spread measures the range of a data set and comprises the variance, standard deviation, variables on the minimum and maximum, skewness, and kurtosis. This paper is divided into three sections that will describe different statistical measures. The first section is on descriptive statistics that are both measures of central tendency and spread measures. More so, a histogram is used to represent the dataset using excel. Part two will test data normality using a histogram. Finally, the third section will estimate the Mean of the Population, illustrate a hypothesis, and offer a summary.
Dubai Statistics Center https://www.dsc.gov.ae/en-us
The data set selected to show the iron 60 – 160 µg/dL levels in a person’s body.
Table 1: Descriptive Statistics of the Population (Mean, Median, Mode, Variance and Standard Deviation)
Mean | 136.516 |
Median | 134.9 |
Mode | #N/A |
Variance | 3258.946781 |
Standard Deviation | 57.08718578 |
Maximum | 219.36 |
Minimum | 42.61 |
Range | 176.75 |
Skewness | -0.147151978 |
Kurtosis | -1.335204003 |
Count | 60 |
Figure 1: Histogram on Iron Levels
Figure 2: Histogram on Iron levels
Table 2: Descriptive Statistics for Iron, 60 – 160 µg/dL
Name | Iron, 60 – 160 µg/dL |
Fatma Ahmad Ali Khalfan | 188.96 |
Fatma Ahmad Ramadhan | 214.03 |
Fatma Nasser Hussain | 190.57 |
Fatma Salem Ahmad | 187.13 |
Fuad Ahmed Mohamed | 45.51 |
Hajar Humaid Salem | 135.54 |
Hana Hussain Essa | 87.01 |
Hanadi Mubarak | 156.5 |
Hanifa Abdulfatah Hassan | 140.31 |
Hassan Al Jaidi | 114.73 |
Hend Abdulla Mohammad | 212.09 |
Hessa Thani Mubarak | 57.34 |
Iman Hussain Abdulla | 71.68 |
Juma Sayed Abdulla | 108.97 |
Laila Ahmad Mohamed | 173.15 |
Descriptive Statistics | |
Mean | 138.9013 |
Median | 140.31 |
Mode | #N/A |
Variance | 2937.609 |
Standard Deviation | 54.19972 |
Maximum | 214.03 |
Minimum | 45.51 |
Range | 168.52 |
Skewness | -0.28206 |
Kurtosis | -1.21811 |
Count | 15 |
The histogram results, as shown above in figure 2, is the bimodal shapes. The histogram has one peak at the center, and the other bars are gradually decreasing below the peak on the left and the right side.The bimodal shapeon the histogram indicates that the data is from two different sources. Furthermore, the shape shows that the two data sources should be separated, reviewed, and analyzed separately to draw statistical conclusions from the data population.
Explain clearly how you carried out sampling using this method. 5Marks
The testing technique applied to sample the Population is simple random sampling. Under this technique, each sample in the data set has an equal likelihood or possibility of being selected. The sample selected is meant to offer a fair representation of the entire data population, and in case of a variation, it is termed as a sampling error. The sampling method was selected because it is accurate, easy, straight-forward, unbiased, and offers an equal opportunity of every data subset to be sampled from the entire population. Finally, each data on iron levels has an equal opportunity of being selected; thus, the data set was randomly selected from the overall Population.
PART II: Checking the Normality of Population (6 Marks)
a. Read Pages 322-324 Of The Textbook (Provided In Bbl). 0Mark
b. Using the Histogram Constructed in Section Id, Calculate Pearson’s Coefficient of Skew-Ness and Check for Outliers. 5 Marks
The histogram in Section 1d is for the entire Population.
Table 3: Descriptive statistics and skew-ness of the Population as a whole
Mean | 136.516 |
Median (Q2) | 134.9 |
Mode | #N/A |
Variance | 3258.946781 |
Standard Deviation | 57.08718578 |
Maximum | 219.36 |
Minimum | 42.61 |
Range | 176.75 |
Skewness | -0.147151978 |
Kurtosis | -1.335204003 |
Count | 60 |
1st Quartile(Q1) | 85.9175 |
2nd Quartile (Q2) | 134.9 |
3rd Quartile (Q3) | 189.3625 |
IQR | 103.445 |
0.084922736
The outliers are the values of the first and third quartiles which is given as follows;
Table 4: 1st and 3rd quartile
1st Quartile(Q1) | 85.9175 |
3rd Quartile (Q3) | 189.3625 |
d. Make a final comment (using the values/findings from the above step) about the normality of the Population.1 Mark
The data range is 176.75, and the Pearson coefficient is a positive value of 0.084922736. This implies that the normal curve of the Population is skewed to the right.
PART III: Estimation of Population Mean (20Marks)
Sample size= 15
Standard deviation= 54.19972
Sample size mean= 138.9013
Confidence level = 95%
Z-test= 0.5
=131.9041462µ145.8984538
Table 5: Sample size 31
Sample size 31 | |
Name |
Iron,
60 – 160 µg/dL |
Aaeda Abdulaziz E | 177.09 |
Abdulla Ali Mohamed | 123.38 |
Abdulla Juma Jaffar Hussain | 182.93 |
Abdullah Khamis Mohamed | 107.34 |
Abdulrasool Saleh | 195.72 |
Abeer Rashed Saeed | 205.69 |
Afra Belal Ismail | 204.81 |
Ahlam Abdulla Mohamed | 217 |
Ahmad Abdulla Ahmad | 142.29 |
Ali Ahmed Sulaiman | 114.38 |
Ali Mohamed Ali | 112.46 |
Ali Mohammad Bin Tamim | 146.68 |
Amal Hassan Abdulla | 44.74 |
Amna Ali Saeed Mohammad | 210.77 |
Asma Saleh Jaafar | 118.52 |
Ayesha Abdulrazaq | 202.56 |
Aysha Yousif Matar | 55.21 |
Fatima Sulaiman Abdulla | 172.93 |
Fatma Ahmad Ali Khalfan | 188.96 |
Fatma Ahmad Ramadhan | 214.03 |
Fatma Nasser Hussain | 190.57 |
Fatma Salem Ahmad | 187.13 |
Fuad Ahmed Mohamed | 45.51 |
Hajar Humaid Salem | 135.54 |
Hana Hussain Essa | 87.01 |
Hanadi Mubarak | 156.5 |
Hanifa Abdulfatah Hassan | 140.31 |
Hassan Al Jaidi | 114.73 |
Hend Abdulla Mohammad | 212.09 |
Hessa Thani Mubarak | 57.34 |
Iman Hussain Abdulla | 71.68 |
Mean | 146.3194 |
Median | 146.68 |
Mode | #N/A |
Variance | 2938.378 |
Standard Deviation | 54.20681 |
Maximum | 217 |
Minimum | 44.74 |
Range | 172.26 |
Skewness | -0.4475 |
Kurtosis | -0.99975 |
Count | 31 |
1st Quartile(Q1) | 113.42 |
2nd Quartile (Q2) | 146.68 |
3rd Quartile (Q3) | 193.145 |
IQR | 79.725 |
Persian Coefficient= 3(Mean- Median)/ standard deviation | |
PC= | -0.01996 |
Table 6: Sample Size 31
Sample size 45 | |
Name |
Iron,
60 – 160 µg/dL |
Ayesha Abdulrazaq | 202.56 |
Aysha Yousif Matar | 55.21 |
Fatima Sulaiman Abdulla | 172.93 |
Fatma Ahmad Ali Khalfan | 188.96 |
Fatma Ahmad Ramadhan | 214.03 |
Fatma Nasser Hussain | 190.57 |
Fatma Salem Ahmad | 187.13 |
Fuad Ahmed Mohamed | 45.51 |
Hajar Humaid Salem | 135.54 |
Hana Hussain Essa | 87.01 |
Hanadi Mubarak | 156.5 |
Hanifa Abdulfatah Hassan | 140.31 |
Hassan Al Jaidi | 114.73 |
Hend Abdulla Mohammad | 212.09 |
Hessa Thani Mubarak | 57.34 |
Iman Hussain Abdulla | 71.68 |
Juma Sayed Abdulla | 108.97 |
Laila Ahmad Mohamed | 173.15 |
Latifa Abdulla Husain | 42.61 |
Latifa Mubarak Bilal | 172.67 |
Lubna Mohamed Sharif | 123.64 |
Mahmood Ahmad Abdulrazzaq | 207.56 |
Marwa Yousuf Mahmoud | 134.26 |
Maryam Abdulhamid Moahmed | 183.36 |
Maryam Hamad Obaid | 76.75 |
Mira Ahmad Khalfan | 126.64 |
Moammer Abdulla Saed Salem | 210.95 |
Mohamed Ghanim | 217.94 |
Mohammad Ali Jamil Al Balooshi | 191.7 |
Mohammed Sulaiman Al Mehri | 54.68 |
Mona Ahmed Mohamed Ali | 181.81 |
Muna Ali Ahmad Mohammad | 111.07 |
Nadia Abdulla Mohamed | 52.73 |
Noora Ibrahim Mohaed Ahmad | 95.58 |
Rashed Abdulla | 219.36 |
Reem Omair Ali Omair | 56.05 |
Saeed Mubarak Rashid | 76.23 |
Saeed Salem Saeed | 82.64 |
Sara Abdulla Ibrahim Mohamed | 164.17 |
Sara Abdulla Mohamed Al Abdulla | 108.94 |
Sarah Ali Abdulla Ahmad | 43.52 |
Sulaiman Darwish | 53.46 |
Zainab Ali Ebrahim Ali | 197.76 |
Rahman Ali | 78.57 |
Muna Moammar | 108.29 |
Mean | 130.8258 |
Median | 126.64 |
Mode | #N/A |
Variance | 3430.175 |
Standard Deviation | 58.5677 |
Maximum | 219.36 |
Minimum | 42.61 |
Range | 176.75 |
Skewness | -0.00951 |
Kurtosis | -1.46653 |
Count | 45 |
1st Quartile(Q1) | 76.75 |
2nd Quartile (Q2) | 126.64 |
3rd Quartile (Q3) | 187.13 |
IQR | 110.38 |
Persian Coefficient= 3(Mean- Median)/ standard deviation | |
PC= | 0.214407 |
Comment on this. 2Marks
The widths at the two confidence level intervals at 95% on the iron level have a difference. When the lowest values are subtracted at the highest levels, a difference of 13.99430753 is generated (145.8984538-131.9041462).
All the confidence levels contain the population mean.
At 95% confidence level the Population mean =131.9041462µ145.8984538
PART IV: Hypothesis Testing (16 Marks)
In this study, a research problem was stated and conducted to find the iron levels in a person’s body. The population sample size was 60 people, with the average iron levels of 136.516. The population standard deviation is 57.08718578. A sample of 15 people was carried out to determine the significance level of 95%. The sample mean of the iron levels is 138.9013333. The research will establish if these iron levels are significant to the body.
Hypothesis testing
H0: µ = 60
H1: µ = 60
This is a two-tailed test, where the z-test is used to test the hypothesis;
Sample size= 15
Standard deviation= 54.19972
Sample size mean= 138.9013
Confidence level = 95%
Z-test= 0.5
=131.9041462µ145.8984538
Table 7: 2-tailed Normal Distribution
The value of the Z-test lies within the unacceptable region. Thus, accept H1 and reject H0. Hence, with confidence or significance level of 95%, most people accept the study’s objectives. The 95% confidence level implies that the iron levels are significant to the body.
Project Report Format (5 marks)
______________________________________________________________________________________
Place an order in 3 easy steps. Takes less than 5 mins.