Phase III needs to be done. Data analysis needs to be done on this program called JMP. You can download it from the internet. It does not need to be long 3-5 pgs total. I am also attaching I and II as reference.Deadlines is in 24 hours so please complete it asap.
phase_2.docx

phase_iii.docx

qualitative_and_time_series_data_and_write_up.docx

Unformatted Attachment Preview

Dan Norgaard
Data Analysis for Managers
Summer 2016
PROJECT ASSIGNMENT – PHASE II
1. I am 90% confident that the true proportion of not defective toners is contained in the interval
72.6% to 77.9%. I am 90% confident that the true proportion of defective toners is contained in
the interval 22.5% to 27.9%.
2.
We will use a margin for error of +/- .01. The .01 Margin for Error was determined based on
Katun’s strict quality compliance requirements. It has been determined by the quality assurance
lab, and QA and compliance consulting group, that testing to the ME of .01 is required to avoid
significant logistical costs associated with recalling and reworking defective product, and the
short and long-term loss of sales caused by products being defective at customer locations.
a. (1.96)2 (.749)(.251)/(.01)2=7,222.17. We will need a sample size of 7,223 to ensure .01
margin of error with a 95% confidence interval.
3. Different data than used in questions 1 and 2. A local official believes a particular
neighborhood will vote in favor of the referendum and that voters will vote yes by greater than
50%. The official bases this off of previous referendum votes and his knowledge of the
community.
Referendum for public school funding:
Hypothesis test for referendum: Two tailed, .01 level of significance
a. 50% of people will vote for the referendum. HO: p = .50
b. Greater than 50% of people will vote for the referendum. HA: p ≥ .50
(.5)(.5)
c.
(.62295-.50)/√(
d.
e.
f.
g.
 = .01. Reject Ho if z calc greater than > 2.326.
np0 and n(1-p0) are ≥ 15: 61(.5) = 30.5. 61(1-.5)=30.5
n=61 voters, x=38, p hat=..62295, z-calc=1.92, tabled value= .4713 , p value= .0274
Fail to reject null hypothesis. P would need to be < .01. This means at .01 confidence you fail to reject the null hypothesis and cannot conclude that at least 50% people will vote for this referendum. 61 )= 1.92054=z The election polling business is very important, and is never more visible than in the Fall of an election year. The importance of understanding and predicting outcomes as they relate to elected officials, referendums, and specific propositions is quite high as these outcomes effect the flow of money and ideas. 4. Data analysis for Q1 Sales (by customer) 2016: 95% confidence interval. These numbers represent sales for Minnesota customers in Q1 of 2016. We can say with 95% confidence that mean sales will fall with the interval $1,561 and $2,440 in a given 1st quarter of the year. 5. The experiment is set up to determine whether offering promotional swag (in this case free flashlights) will positively impact sales numbers at Katun Corp. a. HO: (mu)_sales flashlight= (mu)_sales no flashlight. b. HA: (mu)_sales flashlight≠(mu)_sales no flashlight. c. d.  = .01. Reject Ho if t calc greater than n-1 (14-1)> 3.012.
e. 2 random samples. Unknown sigmas. Sigmas are not assumed to be equal. We do not
know the distribution, but by the Central Limit Theorem the means it should be normal.
f. N1=17, n2=14. X bar= -1.895, s=.8349, t-calc=2.26958, p-value=.0311
g. Fail to reject Null Hypothesis because our p value= .0311, and H0 is .01.
6. Type I error would mean that flashlights had no positive impact on sales, but the researcher
rejects the null hypothesis and concludes that free flashlights helps to increase sales revenue.
The cost of such an error would be the extra procurement and logistical costs associated with
the distribution of the free flashlights.
Type II error essentially means the free flashlights are increasing sales, but the researcher
accepts the null hypthosis concluding that they were not increasing sales. The business
implication of this type of error would be that this company stops using a productive sales tool
and thus revenue/margin are negatively affected.
The costs of Type I vs Type II errors in this scenario are not extreme in either direction. Since
flashlights are cheap to purchase in bulk, and profit margins for toner cartridges have a mean of
73% gross margin, it would be better to risk a Type I error, than a Type II error.
***My company offers things like free flashlights, tools, and other accessory items to customers
making large purchases. So as silly this sounds, it is a real life situation.
In other real world scenarios, we could look at the process of screening potentially pregnant
women before they received x-rays. A Type 1 error would be harmful to the health of the
unborn child, so a very low Alpha number would be set.
This Phase of the Project will incorporate the techniques of regression and quality tools as they apply to your data
and business applications. Your analysis can continue with the same data from Phases 1 & 2 of the Project, or you
may choose a more appropriate data set. The intent of this assignment is to develop a regression model to explain
the variability of your dependent variable (from your quantitative data – either cross-section or your time series
data). And to analyze your data with quality tools. Number your project with the following:
1) Begin your Project with an Executive Summary – This is a brief statement about the business applications
written for the professor of the course. Note – this can contain similar elements of previous phases’
Executive Summary paragraphs. Incorporate the computer output and graphs to complete your report.
Note that the computer output including graphs can be cut and pasted into the appropriate locations in the
word document. No appendices and no raw data are necessary.
2) Build the Theoretical regression model: With the quantitative data, identify 5-6 factors that would
influence (cause) the level of the quantitative data (dependent variable) to change. These should be
‘causal’ variables in an independent / dependent variable relationship. Specify the direction of the
relationship and explain why you would expect a positive or a negative relationship between each of the
independent variables and the dependent variable. For example, the Theoretical Economic Demand Model
attempts to explain variability in the demand for any good or service whereby Demand (sales) = f(price of
the product, tastes & preferences of consumers, consumer income, prices of related goods, number of
buyers in the market, price & income expectations). Price of the product is expected to have a negative
impact on sales (law of demand), consumers’ income will have a positive impact on sales for normal goods
and negative for inferior goods, etc. These are the ‘causal’ variables predicted by theory. These theoretical
variables are difficult to measure and difficult to gather measurements. Therefore, the search is on for
‘proxy’ or ‘stand-in’ variables that attempt to quantify the theoretical variables in the theoretical model.
For example, tastes & preferences of consumers cannot be measured perfectly so we attempt to find the
best data to quantify tastes & preferences of consumers. In many models it could be measured via
consumer surveys. In other models, the amount of advertising expenditures is used as a (poor) proxy
variable to quantify consumer’s tastes & preferences. As another example, the number of buyers in the
market cannot be measured perfectly so a proxy variable of population in the area is generally used. As
another example, the number of defects, errors, or failures in any process is generally influenced by the
quantity and quality of inputs in the process. Quality of inputs is rarely measured perfectly. For example,
quality of workers is typically measured based on the length of time on the job. Not a perfect measure of
worker quality.
3) Describe your ‘proxy’ variables. The proxy variables represent data series that can be quantified as an
attempt to measure the theoretical variables. Gather data on the proxy variables. A minimum of 2
independent variables must be gathered. Try for a minimum of 30 observations of all variables. Look for
quantitative continuous variables, not qualitative yes/no or discrete variables. The proxy variables
represent the data to collect, input into JMP and analyze. Data can be from your work. Be sure to follow
company guidelines on data. In many cases companies do not allow their data to be used outside of the
work environment. In which case I would suggest disguising the data or creating your own data to
illustrate the business applications. Data are also available on federal government websites. I would start
with www.fedstats.gov which is an alphabetical listing of many, many federal government websites.
4) Load the data series on the proxy variables into JMP. Construct the scatterplot matrix between each of the
independent variables and the dependent variable. Select one of the scatterplots and identify the four types
of information on the scatterplot. Note that the slope of the bivariate regression can be estimated by fitting
a best-fitting line into the scatterplot ‘by eye’, then estimate rise over run. Generate the correlation matrix.
Are the correlation coefficients between the dependent and independent variables significantly different
from zero? Test at the .05 level of significance for one of the correlation coefficients. Show all steps for
the relationship between the dependent variable and one of the independent variables. Copy the key for the
hypothesis test from the ALT-CAT Guidebook to get all steps, the graph, and the decision-criteria matrix.
Change the text boxes to your numbers. Use the p-value vs. alpha criteria to analyze all of the other
independent variables.
5)
Build the Empirical regression model: In JMP, run the multiple regression based on the proxy variables. Interpret
the coefficient of determination R-sq. Test the significance of the overall global regression at the .05 level (show all
steps). Fully interpret all of the estimated regression slope coefficients. Test the significance of the individual
regression slope coefficients at the .05 level. Show all hypothesis test steps for one of the independent variables.
Then use the p-value vs. alpha criteria to test the significance of the other independent variables. No need to show
all steps for every independent variable. Make sure that all of your interpretations are in terms of the business
application and show all steps of the hypothesis test. My suggestion is to copy the relevant hypothesis test from the
ALT-CAT Guidebook. This provides a form for all steps including the decision criteria matrix and the graph.
Change the text boxes for your data. Provide a summary of your findings for upper management.
6.
Construct either an Xbar R Chart or ImR chart. Your data should be quantitative data of at least 20
observations for the ImR chart or at least 20 observations made up of 2-5 data points for the X bar R chart.
Utilize the ‘Questioning Process of Control Charts’ as presented in the lecture notes. Is the process ‘in
control’ or ‘out of control’? Speculate on any data points that are ‘out of control’ – why are they out of
control. What actions would you recommend for improving the process?
DAN NORGAARD
PHASE I
7/11/2016
Qualitative Data
Bar Chart of Quality Defects:
1.
2.
3.
4.
5.
Who: Collected by Katun Corp HQ QA Team
What: Defective samples collected
When: Time period is May 2016
Where: North American business unit
Why: Better understand where defects are occurring
Analysis: It is clear that the number one quality defect is toner that spills/leaks from the bottle. It is
greater than 50% of all quality defects in the month of May 2016. It is very obvious it needs attention.
Computer Chip Error is the next largest quality issue, and, at >20% also demands attention.
Pie Chart of Quality Defects
1. Who: Collected by Corp HQ QA Team
2. What: Defective samples collected
3. When: Time period is May 2016
4. Where: North American markets
5. Why: Pie chart demonstrates proportions more clearly
Analysis: Similar to bar chart, it is clear that leaking toner requires immediate attention.
Time Series Data
Time Series Plot: North America Quarterly Sales
Analysis: Data trends show overall decline in sales. Overall decline is attributed to decline in overall
copier and printer industry which is due to increased use of paperless devices and greater attention to
“go green” initiatives. Variability within the data itself would indicate that Quarter 1 sales are
strongest. This is likely due to companies making purchases with renewed budgets, and after the
Holiday slowdown is over. Outliers include 4th Quarter of 2010 and 1st of Quarter of 2011. I am not
able to disclose the reason for the increase in sales in Q4 2010 and Q1 2011.
Z score for Q1 2011= 25,172,816-14,353,971/2,993,541=3.614=z score .49987. This was more than
three standard deviations from the mean
Summary: The probability of this large spike in sales was .5-49987= .001 (rounded to nearest
thousandth). Wow!
(Mean, median, any other pertinent information listed below)

Purchase answer to see full
attachment