Remove duplicate rows in CSV file

mohammad Alsajri

23 Jul 2019

1 Answer

Answer Accepted

Updated 25 Jul 2019

12 Views (30 days)

Follow Question

Show older comments

0 votes

hello dear mathworkers,

I have a dataset consist of approximatlly 4 millions records, and i want to remove the duplicated rows or records, can any one help me with the way, i am using matlab 2018a . thanks in advance

7 Comments
Show 5 older comments Hide 5 older comments

madhan ravi on 24 Jul 2019

Mohammed: Alex's solution should have solved your problem.

mohammad Alsajri on 25 Jul 2019

thanks for help guys

Follow Question

Accepted Answer

Alex Sune on 23 Jul 2019

Open in MATLAB Online

1 vote

Since all is numeric data, you can use:

data = xlsread('kdd.xlsx');
datanew = unique(data,'rows');

2 Comments
Show None Hide None

Shameer Parmar on 23 Jul 2019

This is not working, because non of data is similar.. I dont find duplicate entries in this sheet provided by Mohammad Alsajri..

using your command, the 'data' and 'datanew' both are getting exact same..

Alex Sune on 23 Jul 2019

This code works!

I guess the excel provided by Mohammad is just a small portion of the dataset (4 million of rows).

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Remove duplicate rows in CSV file

7 Comments
Show 5 older comments Hide 5 older comments

Accepted Answer

2 Comments
Show None Hide None

More Answers (0)

Categories

Tags

Community Treasure Hunt

Remove duplicate rows in CSV file

7 Comments Show 5 older comments Hide 5 older comments

Accepted Answer

2 Comments Show None Hide None

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

7 Comments
Show 5 older comments Hide 5 older comments

2 Comments
Show None Hide None