Check Out Our Shop
Page 1 of 2 1 2 LastLast
Results 1 to 25 of 41

Thread: Statistics Question

  1. #1
    Join Date
    Oct 2003
    Location
    Under the bridge, down by the river
    Posts
    4,881

    Statistics Question

    I'm trying to go through a data set and find which values are 'outliers'. Not sure of the best way to do this, I first thought ANOVA then a simple boxplot, but I thought that might not work but the data is not really Gaussian, and I dont know much about nonparametric tests.

    Experiment Background:
    Basically a test is administered to 5 subjects. I want to see if a given question seems to be a problem for the multiple subjects. The only thing is the scores vary wildly between subjects, like a genius and a 3 year old are taking the same test, so a good score on a given question for the 3 year old would be fantastic, but compared to the genius, really bad. But I want to see if there is a way to compare each question to see if a bad score for one subject(compared to his overall results) is also a bad score for another subject, compared to her overall results. Does that make sense?

    Ideas?

  2. #2
    Join Date
    Mar 2004
    Location
    Mammoth/Santa Barbara
    Posts
    1,497
    Quote Originally Posted by CantDog View Post
    a test is administered to 5 subjects.
    You're gonna need more subject. Do you have the subject categorized groups?

    Quote Originally Posted by CantDog View Post
    I want to see if a given question seems to be a problem for the multiple subjects.
    Problem? What do you mean by this? This looks to be your null hypothesis.

    Quote Originally Posted by CantDog View Post
    so a good score on a given question
    Good?

  3. #3
    Join Date
    Dec 2005
    Location
    SLC
    Posts
    1,488
    Sounds like you want to know if any particular question is hard or easy for everybody. Which means you want to eliminate the individual effect and see what the question effects are. So you should rescale the question scores so the test scores are equal for all individuals and then see if there is any correlation for scores on particular questions. Which there will be, of course.

    BTW, if you don't know the distribution from which the scores are drawn, there is no such thing as an outlier.

  4. #4
    Join Date
    May 2006
    Location
    Trying hard to stay in the present moment
    Posts
    933
    More subjects first, identify outliers later.
    Try to keep two ideas in your head at the same time without blowing your brains out your ass.

  5. #5
    Join Date
    Nov 2005
    Location
    Under the snow
    Posts
    1,589
    Quote Originally Posted by David Witherspoon View Post
    BTW, if you don't know the distribution from which the scores are drawn, there is no such thing as an outlier.
    That's true as far as it goes. But as the population increases it becomes more obvious which data points are unduely influencing the overall results. Eliminating these may, or may not, be appropriate. For example when looking at aviation incidents in California ...
    1. Salton Sea (Death Valley) turns out to be the unsafest airport when incidents are matched to operations. But there's been all of one incident in the last 20 years.
    2. Equivalently San Francisco International turns out to be the safest because of the immense number of operations compared to incidents. But there's next to no general aviation who are the ones who seem to have the death wish.
    Are those two examples outliers or part of the distribution ? I, as a statistician by training, would eliminate Salton Sea because of the small dataset of incidents and operations and bitch about the excessive influence San Francisco (and LAX) have on the overall distribution model.

  6. #6
    Join Date
    Dec 2005
    Location
    SLC
    Posts
    1,488
    Quote Originally Posted by TruckeeLocal View Post
    That's true as far as it goes.
    I am always willing to give my full and unreserved support to tautologies.
    Not to much else.

    Now if I were an economist ...

  7. #7
    Join Date
    Nov 2005
    Location
    Making the Bowl Great Again
    Posts
    13,817
    83% of all statistics are made up.

  8. #8
    Join Date
    Dec 2005
    Location
    SLC
    Posts
    1,488
    ... can you give me a confidence interval on that?

  9. #9
    Join Date
    Nov 2005
    Location
    Making the Bowl Great Again
    Posts
    13,817
    No, but the standard deviation is eleventy brazilian.

  10. #10
    Join Date
    Oct 2005
    Location
    Wasatch
    Posts
    6,253
    Quote Originally Posted by David Witherspoon View Post
    Now if I were an economist ...
    I feel like this should offend me, but ... it doesn't.

  11. #11
    Join Date
    Dec 2005
    Location
    SLC
    Posts
    1,488
    C'mon ... please? Surely there must be some hope? Just a little?

  12. #12
    Join Date
    Oct 2003
    Location
    Under the bridge, down by the river
    Posts
    4,881
    Ok...

    The 'subjects' are monkeys, so 5 is okay as far as # of subjects go.

    Basically the test is the monkeys learn to replicate a repeated pattern of sounds (like the game simon says) and I record the number of trials it takes to correctly match the pattern. We have 150 patterns. If a monkey has a particular problem on say pattern number 15, I want to see if the other monkeys have problems on that problem too(i.e. a bad pattern), or if its just a dumb monkey. Some of the monkeys do remarkably better than others, so the analysis has to account for hugely different means in the number of trials required to learn the pattern between subjects. Thats why I was thinking nonparametric.

  13. #13
    Join Date
    Dec 2003
    Location
    here
    Posts
    2,129
    There were three of us today standing around a hub assembly that would not come off and none of us could agree on what a monkey wrench is. Does this help?
    If it weren't for serendipity, there'd be no dipity at all

  14. #14
    Join Date
    Oct 2003
    Location
    Under the bridge, down by the river
    Posts
    4,881
    Were there 5 monkey wrenches?

  15. #15
    Join Date
    Oct 2006
    Location
    Milpitas, CA
    Posts
    2,805
    Quote Originally Posted by CantDog View Post
    Were there 5 monkey wrenches?
    For 3 people?! We're not all rich like you.

  16. #16
    Join Date
    Dec 2003
    Location
    here
    Posts
    2,129
    Quote Originally Posted by CantDog View Post
    Were there 5 monkey wrenches?
    no, but there were 3 monkeys




    and we were having a problem with pattern #1
    Last edited by train07; 08-22-2007 at 06:07 PM.
    If it weren't for serendipity, there'd be no dipity at all

  17. #17
    Join Date
    Dec 2005
    Location
    SLC
    Posts
    1,488
    Take a look at a 3d plot of the first three principal components of the 150 questions, treating the 5 monkeys as observations (that's not a lot of observations) and the questions as variables. If the distribution is obviously multimodal, then some patterns were harder to learn than others, and there's a detectable difference between the groups. If the distribution just has a long tail in some direction, then there were harder and easier questions, but no clear line between them. If you still want to draw a line through the distribution, you'll need a null hypothesis of what that distribution shoulda looked like if all questions were similarly difficult.

  18. #18
    Join Date
    Mar 2007
    Location
    arcata
    Posts
    1,265
    One more question pertaining to stats (didn't want to start a new thread)

    boys 18-24 have a mean height of 70.1, SD of 2.7, and this is a standard normal distribution.

    How would I figure the height that 90% of the boys are under. (only 10% of boys are taller than this height)

    I am not looking for an answer just how to go about figuring it.
    thanks
    whatever I feel like i what to do!

  19. #19
    Join Date
    Aug 2004
    Location
    New Haven Line heading north
    Posts
    2,956
    i usually look at the data and yell, "Which one of you motherfuckers is an outlier!" clears things up pronto.
    Charlie, here comes the deuce. And when you speak of me, speak well.

  20. #20
    Join Date
    Oct 2005
    Location
    Wasatch
    Posts
    6,253

  21. #21
    Join Date
    Mar 2007
    Location
    arcata
    Posts
    1,265
    Thanks. I got that far. I guess it is easiest to use normal distribution functions on excel
    the answer I got was 73.6in
    whatever I feel like i what to do!

  22. #22
    Join Date
    Jan 2008
    Location
    Live Free or Die
    Posts
    1,289
    Sounds like you want to do a cumulative frequency or exceedance probability calculation:
    http://en.wikipedia.org/wiki/Cumulat...uency_analysis

  23. #23
    Join Date
    Mar 2006
    Location
    Missoula, MT
    Posts
    22,997
    Quote Originally Posted by Sirshredalot View Post
    This.678
    No longer stuck.

    Quote Originally Posted by stuckathuntermtn View Post
    Just an uneducated guess.

  24. #24
    Join Date
    Mar 2007
    Location
    Hyperspace!
    Posts
    1,416
    Z-score

    z(90%)=(x-mean)/sd

    z(90%) being the z-score for the area of the probability distribution function less than 90%

    z for 90% can be found by looking it up in a z-table (have to look at it somewhat backwards to do this) or better since you have excel using norminv(0.9)
    z = 1.28

    For your question you then have
    1.28 = (x-70.1)/2.7
    which you have successfully solved.

  25. #25
    Join Date
    Feb 2008
    Location
    New States
    Posts
    837
    For ideas of how to address the OP problem you might want to look at the literature on analysis of 'panel data'. This methodology seems like it might be appropriate to your problem.

    Regarding the search for outliers in data from 'odd' distributions: I usually try to do some sort of bootstrap analysis. Using these methods can be a bit tricky, particularly for a 'panel data' type situation like you have, and aren't always accepted by some academic communities. They are very robust when properly applied though and can be helpful during the exploratory analysis phase.
    "I just want to thank everyone who made this day necessary." -Yogi Berra

Similar Threads

  1. sick and depraved question for europhiles
    By gobblehoof in forum General Ski / Snowboard Discussion
    Replies: 6
    Last Post: 01-24-2007, 04:19 PM
  2. Legal mag question... so NSR it is ridiculous...
    By Evmo in forum The Padded Room
    Replies: 17
    Last Post: 05-03-2006, 12:55 PM
  3. Question about binding screws...
    By mitch buchannon in forum Tech Talk
    Replies: 5
    Last Post: 02-20-2005, 06:38 PM
  4. Health Insurance Question
    By Below Zero in forum TGR Forum Archives
    Replies: 21
    Last Post: 09-12-2004, 06:37 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •