Results 26 to 41 of 41
Thread: Statistics Question
-
07-12-2012, 07:03 PM #26
Plot your data using a qq plot. Any outliers should be obvious.
Consider using a log transformation to "fix" your data if it is skewed.
Sent from my DROIDX using TGR Forums
-
07-12-2012, 07:27 PM #27
thanks for the helpful ideas.
the wiki links just reminds me that i have no clue what i'm doing.whatever I feel like i what to do!
-
07-13-2012, 10:34 AM #28
I might use a regression model, with multiple variables. This will allow you to find out what the effect of the question is, independent of variables such as age, intelligence.
You will come up with a model which is something like this:
(b1)x1 + (b2)x2 + (b3)x3 = y, where the b values are the change in y given a one unit change in x1(or x2 or x3) while all other variables remain the same. This will allow you to test the difficulty of a given question independently of factors such as age.
An example might be home prices. You might have a y variable, which is home price, and how it is a function of a location variable(x1) plus a square footage variable(x2) plus a number of bedrooms variable(x3). This would look like:
y = b1*x1 + b2*x2 + b3*x3, which would give you your predicted y for given scores of x1, x2, x3.
The beta matrix of data would be (xTx)inversexTy, which would be your b1, b2, and b3.
You could do this to show if certain questions have lower scores, relative to the ability of the test takers.
EDIT: I looked at your question again, and my solution is neither necessary nor efficient. It would probably be easier to just figure out what the average score and standard deviation are for each subject, and then test each score using a normal table. This will show you how far off each test score is from the mean, how many standard deviations off it is. If most of the questions are pretty clustered, and one is pretty far off, this will show that. You could also do a box plot for each subject, which would not be too hard given that you only have 5 subjects, and look for consistencies. If a question really is a bit of an outlier, you should be able to see it consistently in all of the plots.
You could also try to find a parametric distribution for the data, using maximum likelihood estimation, and then test the fit using a pp plot, but that could be tough to do. It might not be easy to find a distribution that fits your data, so it might not be worth your while. You only have 5 subjects, so finding the questions which are outliers should not be all that difficult. I suspect a box plot will show what you need.
Even easier plots might help too. Just plotting question score on the y and question number on the x, and doing this for each subject, will show you if any questions are consistent outliers.Last edited by Long duc dong; 07-13-2012 at 11:11 AM.
"Have you ever seen a monk get wildly fucked by a bunch of teenage girls?" "No" "Then forget the monastery."
"You ever hear of a little show called branded? Arthur Digby Sellers wrote 156 episodes. Not exactly a lightweight." Walter Sobcheck.
"I didn't have a grandfather on the board of some fancy college. Key word being was. Did he touch the Filipino exchange student? Did he not touch the Filipino exchange student? I don't know Brooke, I wasn't there."
-
07-24-2012, 01:23 PM #29
Sorry guys......
One quick question. I have to pick if this would require a paired t-test or an independent t-test.
I think it is independent, but can't be wrong or I will miss tons of points on the following calculations.
rental car prices were compared between company x and y in ten different cities.
the car is a small compact
city company x company y
a 5 3
b 4 5
c 6 5
d 3 5
e 5 4
f 6 3
g 7 4
h 5 7
i 2 4
j 4 6
this is just a generic replica
thankswhatever I feel like i what to do!
-
07-24-2012, 01:36 PM #30Registered User
- Join Date
- Jul 2007
- Posts
- 247
-
07-24-2012, 06:56 PM #31
The fact that data is being pulled from two independent sources makes your assumption correct.
-
07-25-2012, 02:35 AM #32
oh can i join in here?
i have dates of when the snow melts at several different loggers for about 10 years. snow melt seems to be shifting to earlier dates as time goes on for all loggers. i was told to do an analysis of covariance to see if the trend is significant. no significance using a t-test of correlation between snowmelt date and year. i used snowfall data for each winter (same for all loggers) to group the years into 3 categories (little snow, normal and a lot of snow) and used that as a covariant. i get the attached results (for the mean date of snowmelt over all loggers). i have zero idea what i am doing. can someone tell me if there is some fundamental flaw in my thinking? not enough data (10 years x 8 loggers)? ancova not applicable because ??? trying to make up our own statistics?
edit: hm, i can't get my exciting plot to show up...Ich bitte dich nur, weck mich nicht.
-
07-25-2012, 08:50 AM #33
are the loggers together and susceptible to the same weather patterns? if so its probably treated as 1 enviironment x 10 years. Can you go back any farther? More years would help. You could also do 1940-1950 versus the current 10 year period and see if theres significant deviation.
Decisions Decisions
-
07-25-2012, 09:56 AM #34
thanks for the input! they are reasonably close together in a treeless, high alpine, fairly dry environment (a couple of sq km between 2000 and 3300m asl) but differ in aspect, exposure to wind etc. While subject to the same general weatherpatterns, the snowpack at each spot is really more defined by how much sun/wind/skier traffic it sees. (ie logger under groomer has snow longer than very high logger on an exposed, south facing rock face, even though both get the same weather). i used the accumulated snowfall from a close by waeather station as covariant in the ancova to eliminate the "weather" effect. (at least that's what i think i did.) the loggers have only been in place since 2000 and any older data i could dig up would not really be comparable because snow is so location specific. it does seem like a whole lot of random weirdness for all loggers to be showing that trend. i am a total statistics idiot. i feel like my brain ties itself in knots everytime i try some educational reading.
Ich bitte dich nur, weck mich nicht.
-
04-07-2014, 03:51 PM #35
I'm currently doing my senior project (soil science). It involves comparing two wetlands, one is a reference, and one is a wetland that would be the same but the hydrology has been changed. Caltrans is our project site for mitigation work on HW101.
We have taken samples from the Caltrans site (larger), and also from the reference site. There are more samples from the Caltrans site. We have already done the lab work on bulk density, organic matter, and salinity. We have a data set for both sites. I have sucked at stats in the past, and continue to suck at stats.
I think the correct test here is just a simple t-test to see if there is a significant difference between bulk density in the reference compared to the Caltrans site etc...
Questions: Excel won't do these tests on two different sized data sets. So do I use a random number generator to remove some points from the larger set? I don't want to do this, and would rather include all the data.
Is there a better statistical test that could be used?
Is there a more interesting way to compare the sites that I am not aware of?
Thanks, and also feel free to rag on me again wrong way, I don't mindwhatever I feel like i what to do!
-
04-07-2014, 04:05 PM #36
Just to be an ass, ill note that it is a somewhat large assumption to assume that the reference wetland and the study plot "would be the same". Not necessarily bad, but something to note if this project involves report writing/technical writing.
Oftentimes, I've found that something many science/engineer types miss being clear (or even realizing) the assumptions required to do their study in the way it was done.
Oh, and I don't remember enough of my stats work in excel, but I think Matlab would work around the issue using some matrix manipulation steps that I also don't remember - that's probably not helpful is it?
-
04-07-2014, 04:12 PM #37
Yes there are a few things missing:
1. what is your overall question/hypothesis? this will determine your methods/analyses.
2. was this set up as a BACI (before-after, control-impact) or was the impact already in place when the project started?
3. assumptions of normality - use a qqplot, histogram and/or Kolmogorov-Smirnov test to examine for normality (if the plots look reasonable it probably is).
4. if the data are equal you can do an F-test for equal variance between the two locations, if that is equal then you can use a t-test adjusted for unequal sample sizes.
Yes you were supposed to plan for the analyses before starting the project...
-
04-07-2014, 04:20 PM #38
This does involve report writing/tech writing, but for this study they would have been roughly the same. The reference and our project site are/were esturine wetlands (tidal salt marsh) along the Mad river slough. Our site had a levee installed 80 years ago to keep it from flooding and to convert it to pasture. It still floods enough to be considered a wetland by the Army Corp of Engineers three parameter approach to delineating wetlands. Easily. The difference is that because the main input of water now is from rain, and not tide, it is now classified as a freshwater wetland, and the salinity has largely been removed by leeching from rainfall.
You bring up a good point though. One that I questioned for half of the semester.whatever I feel like i what to do!
-
04-07-2014, 04:31 PM #39
wetlands accumulate organic matter in an anaerobic environment. We propose that the reference wetland will have accumulated more organic matter than our project site because of the reduction of flooding due to the levee. We also propose that if there is higher organic matter in the reference there will be lower bulk density. We also propose that without tidal influence the salinity of the soil will be reduced by leeching.
The levee went in 80 years ago, so if I am understanding the question, the impact was in place when the project started.
Rodger that.
I planned to use simple t-tests to detect significant differences between the sites, however, I am wondering if there is anything that could be improved or modified.whatever I feel like i what to do!
-
04-07-2014, 04:33 PM #40
A back of the envelop calculation I often do is calculate the 95% confidence intervals for each population and see if they overlap. I think excel can handle this. I've even used this in some low level reports with associated histograms becuae its an easy way to visualize the data for the statistics impaired.
-
04-07-2014, 10:18 PM #41
Similar Threads
-
sick and depraved question for europhiles
By gobblehoof in forum General Ski / Snowboard DiscussionReplies: 6Last Post: 01-24-2007, 04:19 PM -
Legal mag question... so NSR it is ridiculous...
By Evmo in forum The Padded RoomReplies: 17Last Post: 05-03-2006, 12:55 PM -
Question about binding screws...
By mitch buchannon in forum Tech TalkReplies: 5Last Post: 02-20-2005, 06:38 PM -
Health Insurance Question
By Below Zero in forum TGR Forum ArchivesReplies: 21Last Post: 09-12-2004, 06:37 PM
Bookmarks