Using ST_DWithin to see duplicatesFastest way to remove matched pointsFinding poi using st_dwithin in...

Conservation of Mass and Energy

When a wind turbine does not produce enough electricity how does the power company compensate for the loss?

Doesn't allowing a user mode program to access kernel space memory and execute the IN and OUT instructions defeat the purpose of having CPU modes?

Why would one plane in this picture not have gear down yet?

NASA's RS-25 Engines shut down time

Did Carol Danvers really receive a Kree blood tranfusion?

Reversed Sudoku

Is it possible to avoid unpacking when merging Association?

Can Mathematica be used to create an Artistic 3D extrusion from a 2D image and wrap a line pattern around it?

How can I ensure my trip to the UK will not have to be cancelled because of Brexit?

What's the "normal" opposite of flautando?

Filtering SOQL results with optional conditionals

Does this video of collapsing warehouse shelves show a real incident?

Is "conspicuously missing" or "conspicuously" the subject of this sentence?

Are all players supposed to be able to see each others' character sheets?

Contract Factories

Is it "Vierergruppe" or "Viergruppe", or is there a distinction?

Should I tell my boss the work he did was worthless

Recommendation letter by significant other if you worked with them professionally?

How to draw cubes in a 3 dimensional plane

Database Backup for data and log files

Examples of a statistic that is not independent of sample's distribution?

weren't playing vs didn't play

Counting all the hearts



Using ST_DWithin to see duplicates


Fastest way to remove matched pointsFinding poi using st_dwithin in postgisST_Dwithin Source CodeST_DWithin matches wrong pointsCombining ST_DWithin and ST_IntersectsPostGIS ST_DWithin negative distanceST_DWithin Optimization PostgisRemoving duplicates in lines using PostGIS?Using ST_DWithin()Renaming duplicates in PostGIS?













0















I tried it this way but I think using DWithin might be better for this situation. What would be the correct order to find duplicates which exist within 10km from each other?



SELECT n, ST_ClusterDBSCAN(geog::geometry, eps := .08, minpoints := 2) over () AS cid
from (
select *
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) >= 2) dups
on dups.dn = t.n
order by t.n
) d


EDIT: I guess I would have to do something like this:



select * from (SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,2) = 0) even,
(SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,1) = 0
) odd
where st_dwithin(even.geog,odd.geog,5000)


but this is confusing... maybe it's better to just to do DWithin first but I'm not sure how to do that.










share|improve this question

























  • Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

    – Michael
    27 mins ago











  • I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

    – jaksco
    6 mins ago
















0















I tried it this way but I think using DWithin might be better for this situation. What would be the correct order to find duplicates which exist within 10km from each other?



SELECT n, ST_ClusterDBSCAN(geog::geometry, eps := .08, minpoints := 2) over () AS cid
from (
select *
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) >= 2) dups
on dups.dn = t.n
order by t.n
) d


EDIT: I guess I would have to do something like this:



select * from (SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,2) = 0) even,
(SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,1) = 0
) odd
where st_dwithin(even.geog,odd.geog,5000)


but this is confusing... maybe it's better to just to do DWithin first but I'm not sure how to do that.










share|improve this question

























  • Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

    – Michael
    27 mins ago











  • I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

    – jaksco
    6 mins ago














0












0








0








I tried it this way but I think using DWithin might be better for this situation. What would be the correct order to find duplicates which exist within 10km from each other?



SELECT n, ST_ClusterDBSCAN(geog::geometry, eps := .08, minpoints := 2) over () AS cid
from (
select *
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) >= 2) dups
on dups.dn = t.n
order by t.n
) d


EDIT: I guess I would have to do something like this:



select * from (SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,2) = 0) even,
(SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,1) = 0
) odd
where st_dwithin(even.geog,odd.geog,5000)


but this is confusing... maybe it's better to just to do DWithin first but I'm not sure how to do that.










share|improve this question
















I tried it this way but I think using DWithin might be better for this situation. What would be the correct order to find duplicates which exist within 10km from each other?



SELECT n, ST_ClusterDBSCAN(geog::geometry, eps := .08, minpoints := 2) over () AS cid
from (
select *
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) >= 2) dups
on dups.dn = t.n
order by t.n
) d


EDIT: I guess I would have to do something like this:



select * from (SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,2) = 0) even,
(SELECT *
from (
select *, ROW_NUMBER() OVER(order by t.n) as rownum
from cities as t
inner join
(select n dn
from cities as t
group by n
having count(*) = 2
) dups
on dups.dn = t.n
order by t.n
) d
where mod(rownum,1) = 0
) odd
where st_dwithin(even.geog,odd.geog,5000)


but this is confusing... maybe it's better to just to do DWithin first but I'm not sure how to do that.







postgis






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 11 mins ago







jaksco

















asked 1 hour ago









jakscojaksco

83




83













  • Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

    – Michael
    27 mins ago











  • I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

    – jaksco
    6 mins ago



















  • Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

    – Michael
    27 mins ago











  • I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

    – jaksco
    6 mins ago

















Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

– Michael
27 mins ago





Please add more context to your question. You have tried ST_ClusterDBSCAN to find duplicates, correct? Show us the select ST_DWITHIN statement you would use. If you find a duplicate, which one do you want to keep, and which one should be deleted, or do you want to delete both of them?

– Michael
27 mins ago













I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

– jaksco
6 mins ago





I'd like to keep the one with the least null values and I'd like to copy the i column's value to the other row before deleting

– jaksco
6 mins ago










1 Answer
1






active

oldest

votes


















0














If your goal is just to identify duplicate records in your data. Then you can use ST_dwithin function like this;




SELECT col1
FROM cities as c1
INNER JOIN cities c2
ON ST_dWithin(geom,10000)
WHERE c1.gid != c2.gid AND c1.col1 = c2.col2


i assumed your data is in projected coordinate system (Unit: meters) and has a unique gid column. The duplication is based on col1, it may be name or any other value which should be unique in 10km radius.





share























    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "79"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315157%2fusing-st-dwithin-to-see-duplicates%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    If your goal is just to identify duplicate records in your data. Then you can use ST_dwithin function like this;




    SELECT col1
    FROM cities as c1
    INNER JOIN cities c2
    ON ST_dWithin(geom,10000)
    WHERE c1.gid != c2.gid AND c1.col1 = c2.col2


    i assumed your data is in projected coordinate system (Unit: meters) and has a unique gid column. The duplication is based on col1, it may be name or any other value which should be unique in 10km radius.





    share




























      0














      If your goal is just to identify duplicate records in your data. Then you can use ST_dwithin function like this;




      SELECT col1
      FROM cities as c1
      INNER JOIN cities c2
      ON ST_dWithin(geom,10000)
      WHERE c1.gid != c2.gid AND c1.col1 = c2.col2


      i assumed your data is in projected coordinate system (Unit: meters) and has a unique gid column. The duplication is based on col1, it may be name or any other value which should be unique in 10km radius.





      share


























        0












        0








        0







        If your goal is just to identify duplicate records in your data. Then you can use ST_dwithin function like this;




        SELECT col1
        FROM cities as c1
        INNER JOIN cities c2
        ON ST_dWithin(geom,10000)
        WHERE c1.gid != c2.gid AND c1.col1 = c2.col2


        i assumed your data is in projected coordinate system (Unit: meters) and has a unique gid column. The duplication is based on col1, it may be name or any other value which should be unique in 10km radius.





        share













        If your goal is just to identify duplicate records in your data. Then you can use ST_dwithin function like this;




        SELECT col1
        FROM cities as c1
        INNER JOIN cities c2
        ON ST_dWithin(geom,10000)
        WHERE c1.gid != c2.gid AND c1.col1 = c2.col2


        i assumed your data is in projected coordinate system (Unit: meters) and has a unique gid column. The duplication is based on col1, it may be name or any other value which should be unique in 10km radius.






        share











        share


        share










        answered 4 mins ago









        Shahzad BachaShahzad Bacha

        1,2381820




        1,2381820






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Geographic Information Systems Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fgis.stackexchange.com%2fquestions%2f315157%2fusing-st-dwithin-to-see-duplicates%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Щит и меч (фильм) Содержание Названия серий | Сюжет |...

            is 'sed' thread safeWhat should someone know about using Python scripts in the shell?Nexenta bash script uses...

            Meter-Bus Содержание Параметры шины | Стандартизация |...