Check Out Our Shop
Results 1 to 11 of 11

Thread: EXCEL help needed

  1. #1
    Join Date
    Dec 2002
    Location
    SLC
    Posts
    428

    EXCEL help needed

    Here's my dilema:

    I need to calculate the area of several land use classifications based on property ownership records. It seems as though the database I am using has many duplicate records. I need to find the duplicate records and delete them. I could of course, go through each record individually, but there are 16,962 records. There should only be about 14,000 records. Handing it off to an intern is not an option at this point. Does anyone know if excel can find the duplicate records? Any help would be appreciated.

  2. #2
    Join Date
    Sep 2004
    Location
    People's Republic of Boulder
    Posts
    796
    Well, this is still a bit manual but..... Find a common theme as in name or area or some column that is repeatable. Maybe tract number. Then sort the data by that column. So, all the repeats will be together. Now, you scroll through and easily find the repeats and delete them. Yeah, it is slow and with 16k records would suck, but this is a way that should only take a few hours.

    We had excel files with 25k records in it this summer and had to find bad data and this is how we did it. I strongly recommend some strong coffee and loud house music to help you focus on this tedious task.

  3. #3
    Join Date
    Feb 2004
    Location
    In the fields, under the yoke
    Posts
    3,342
    As far as I know, Excel doesn't specifically have any functionality that will delete duplicate records, but heres how I deal with it.

    1) Find the column/variable that identifies each line of data (lets call it "account number"). Note, if you have two or three fields of data which identify an observation (for example, name and address), use a concatenate function to create the ID variable (concatenate(name,address)).
    2) Create a new column of data which is essentially a counter by writing a formula that checks to see if account number matches the account number above it - if it doesn't, then counter=1, else counter=last counter value+1.
    3) apply an autofilter to the data - filter on your counter field, and select the 1's.
    4) copy and paste that data into a new spreadsheet - voila, duplicates eliminated.

    you'll probably want to spot check it when you're done to make sure everything worked.

  4. #4
    Join Date
    May 2002
    Location
    Norte del río
    Posts
    2,215
    Do what Telechuck said then follow it up by inserting a column (column "a") next to the one you sorted by (column "b")..... then in row 2, column "a" put in the formula =IF(B2=B1,"1",""). Copy this formula all the way down the sheet......press the F9 key. All the duplicates should have a "1" in column "a". Copy column "a" and paste special with values. Sort by column "a" then delete all lines with 1 (they should all be grouped together).

    I'm bored.


    edit: or do what stump said.....
    Last edited by Hayduke; 09-22-2004 at 10:02 AM. Reason: Stump beat me to it....

  5. #5
    Join Date
    Nov 2003
    Location
    Joisey
    Posts
    2,614
    Quote Originally Posted by telechuck
    Well, this is still a bit manual but..... Find a common theme as in name or area or some column that is repeatable. Maybe tract number. Then sort the data by that column. So, all the repeats will be together. Now, you scroll through and easily find the repeats and delete them. Yeah, it is slow and with 16k records would suck, but this is a way that should only take a few hours.

    We had excel files with 25k records in it this summer and had to find bad data and this is how we did it. I strongly recommend some strong coffee and loud house music to help you focus on this tedious task.
    To make this suggested process a little faster. As telechuck suggests... Sort by the column that you'll use to compare records, then in an adjacent blank column add a conditional statement like the following:
    Code:
    =IF(A2=A1,TRUE,"")
    This bit of code would go in cell B2 if comparing values in column A. Any duplicate values in column A would then have a value of "TRUE" in column B.

    You could just manually delete each row that is TRUE.

    Alternatively, if you wanted to be a little more efficient... After getting the TRUE calculations, highlight the entire column, copy it to your clipboard, then paste just the values (paste special - values) back into the column (calculations are then removed). Re-sort the spreadsheet by column B and do a bulk delete of all the TRUE rows.

    EDIT: Or do what Hayduke said...
    Last edited by spanky; 09-22-2004 at 10:08 AM.

  6. #6
    Join Date
    Sep 2004
    Location
    champlain valley
    Posts
    5,826
    Use Access. If you don't know Access, then find someone at your office that does. If that is not an option, then PM and we can work something out.

  7. #7
    Join Date
    Dec 2002
    Location
    SLC
    Posts
    428
    Thanks guys, but I figured out a better way. Highlight the column you want to filter, go to data/filter/advanced and click on unique values. It eliminates any duplicates.

    edit: regarding Access: it is a database for some mapping software, not sure if I will lose any information converting (dbf to access file) it back and forth in Access, but will be playing around with it to see.
    Last edited by cololi; 09-22-2004 at 11:11 AM.

  8. #8
    Join Date
    Sep 2004
    Location
    champlain valley
    Posts
    5,826
    make a copy of your original data and give it a whack.

  9. #9
    Join Date
    Nov 2003
    Location
    Joisey
    Posts
    2,614
    Quote Originally Posted by cololi
    Thanks guys, but I figured out a better way. Highlight the column you want to filter, go to data/filter/advanced and click on unique values. It eliminates any duplicates.

    edit: regarding Access: it is a database for some mapping software, not sure if I will lose any information converting (dbf to access file) it back and forth in Access, but will be playing around with it to see.
    That doesn't eliminate the duplicates, it just hides their values. The rows are still in the spreadsheet.

  10. #10
    Join Date
    Oct 2003
    Location
    bozone montuckey
    Posts
    4,337
    save it as a csv then write a quick perl or php script to parse the file and rewrite to new csv file if the line isnt a duplicate.

    not sure if its the easiest way to do it, but its what i'd do anyway.

    <edit> if it is actually a database, instead of a excel file, it is even easier and can be done with a sql query.
    Last edited by fez; 09-22-2004 at 12:40 PM.
    "They who can give up essential liberty to obtain a little temporary safety, deserve neither liberty nor safety."
    Ben Franklin

  11. #11
    Join Date
    Oct 2003
    Location
    SLC
    Posts
    696
    The lesson I learned in my MIS class was to never use a spreadsheet to store a database. It was a good lesson. Access will be way faster and more efficient for these types of things.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •