Huge performance difference of the command find with and without using %M option to show permissions ...
What do I do when my TA workload is more than expected?
Was credit for the black hole image misappropriated?
Is there a way to generate uniformly distributed points on a sphere from a fixed amount of random real numbers per point?
Did the new image of black hole confirm the general theory of relativity?
Voltage transmission
How to determine omitted units in a publication
My body leaves; my core can stay
What does "spokes" mean in this context?
Why can I use a list index as an indexing variable in a for loop?
How to make Illustrator type tool selection automatically adapt with text length
The following signatures were invalid: EXPKEYSIG 1397BC53640DB551
Why did Peik Lin say, "I'm not an animal"?
How do spell lists change if the party levels up without taking a long rest?
Is it ok to offer lower paid work as a trial period before negotiating for a full-time job?
Button changing its text & action. Good or terrible?
What to do when moving next to a bird sanctuary with a loosely-domesticated cat?
Using dividends to reduce short term capital gains?
How many Rusted Keys do you need to get red items most of the time?
For what reasons would an animal species NOT cross a *horizontal* land bridge?
What does Linus Torvalds mean when he says that Git "never ever" tracks a file?
Match Roman Numerals
Variable with quotation marks "$()"
Is every episode of "Where are my Pants?" identical?
Does Parliament hold absolute power in the UK?
Huge performance difference of the command find with and without using %M option to show permissions
The 2019 Stack Overflow Developer Survey Results Are In
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election ResultsThe relationship between execute permission on a directory and its inode structureWhat's the idea behind rm not removing non-writable file by default?File inheriting permission of directory it is copied in?Proper create user account with permissions on CentOS/ApacheWhy does chmod succeed on a file when the user does not have write permission on parent directory?Why are both write and execute permissions on a directory necessary to be able to delete files?Find files with group permissions more restrictive than owner permissionsIs it possible to run ls or find and pipe it through stat?What can I change with my nagios command so that it honors the destination folder ACL?Is a process with effective or real uid being root still subject to permission bits of a file when accessing the file?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by runnung:
for i in {1..3000000}; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
bash centos permissions find printf
add a comment |
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by runnung:
for i in {1..3000000}; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
bash centos permissions find printf
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
1 hour ago
add a comment |
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by runnung:
for i in {1..3000000}; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
bash centos permissions find printf
On my CentOS 7.6, I have created a folder (called many_files) with 3,000,000 files, by runnung:
for i in {1..3000000}; do echo $i>$i; done;
I am using the command find
to write the information about files in this directory into a file. This works surprisingly fast:
$ time find many_files -printf '%i %y %pn'>info_file
real 0m6.970s
user 0m3.812s
sys 0m0.904s
Now if I add %M
to get the permissions:
$ time find many_files -printf '%i %y %M %pn'>info_file
real 2m30.677s
user 0m5.148s
sys 0m37.338s
The command takes much longer. This is very surprising to me, since in a C program we can use struct stat
to get inode and permission information of a file and in the kernel the struct inode
saves both these information.
My Questions:
- What causes this behavior?
- Is there a faster way to get file permissions for so many files?
bash centos permissions find printf
bash centos permissions find printf
asked 2 hours ago
BahramBahram
112
112
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
1 hour ago
add a comment |
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use-perm
withfind
to pick out the files with the permissions you're looking for.
– Kusalananda♦
1 hour ago
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
with find
to pick out the files with the permissions you're looking for.– Kusalananda♦
1 hour ago
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
with find
to pick out the files with the permissions you're looking for.– Kusalananda♦
1 hour ago
add a comment |
2 Answers
2
active
oldest
votes
For your 1st question:
I think your problem is not with how quickly the information is accessed, but the output bottleneck.
You are writing the output to info_file
.
When you add %M to the find command, you are now outputting more text due to the permissions. 10 additional characters per line of output. That is 30,000,000 more characters.
This is more data that has to go through the STDOUT redirect to info_file
and get written to disk. More data to push == longer time to write and complete.
In a situation with a single file or a small number of files, it would not be noticeable to a human; time
may give you some variation to measure but it might be too slight to notice.
In your question you are working with 3,000,000 files, so obviously it takes longer to write out the permissions output.
2nd question
I have no idea. Do you have a practical use case for needing to collect permissions for 3,000,000 files, or is this an academic exercise?
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!
– Bahram
1 hour ago
1
... it would be easy to test whether this is the case, by replacing%M
with a fixed string like-rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
add a comment |
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
the same informations are available to readdir(3)
:
struct dirent {
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
};
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
12 mins ago
|
show 1 more comment
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
For your 1st question:
I think your problem is not with how quickly the information is accessed, but the output bottleneck.
You are writing the output to info_file
.
When you add %M to the find command, you are now outputting more text due to the permissions. 10 additional characters per line of output. That is 30,000,000 more characters.
This is more data that has to go through the STDOUT redirect to info_file
and get written to disk. More data to push == longer time to write and complete.
In a situation with a single file or a small number of files, it would not be noticeable to a human; time
may give you some variation to measure but it might be too slight to notice.
In your question you are working with 3,000,000 files, so obviously it takes longer to write out the permissions output.
2nd question
I have no idea. Do you have a practical use case for needing to collect permissions for 3,000,000 files, or is this an academic exercise?
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!
– Bahram
1 hour ago
1
... it would be easy to test whether this is the case, by replacing%M
with a fixed string like-rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
add a comment |
For your 1st question:
I think your problem is not with how quickly the information is accessed, but the output bottleneck.
You are writing the output to info_file
.
When you add %M to the find command, you are now outputting more text due to the permissions. 10 additional characters per line of output. That is 30,000,000 more characters.
This is more data that has to go through the STDOUT redirect to info_file
and get written to disk. More data to push == longer time to write and complete.
In a situation with a single file or a small number of files, it would not be noticeable to a human; time
may give you some variation to measure but it might be too slight to notice.
In your question you are working with 3,000,000 files, so obviously it takes longer to write out the permissions output.
2nd question
I have no idea. Do you have a practical use case for needing to collect permissions for 3,000,000 files, or is this an academic exercise?
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!
– Bahram
1 hour ago
1
... it would be easy to test whether this is the case, by replacing%M
with a fixed string like-rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
add a comment |
For your 1st question:
I think your problem is not with how quickly the information is accessed, but the output bottleneck.
You are writing the output to info_file
.
When you add %M to the find command, you are now outputting more text due to the permissions. 10 additional characters per line of output. That is 30,000,000 more characters.
This is more data that has to go through the STDOUT redirect to info_file
and get written to disk. More data to push == longer time to write and complete.
In a situation with a single file or a small number of files, it would not be noticeable to a human; time
may give you some variation to measure but it might be too slight to notice.
In your question you are working with 3,000,000 files, so obviously it takes longer to write out the permissions output.
2nd question
I have no idea. Do you have a practical use case for needing to collect permissions for 3,000,000 files, or is this an academic exercise?
For your 1st question:
I think your problem is not with how quickly the information is accessed, but the output bottleneck.
You are writing the output to info_file
.
When you add %M to the find command, you are now outputting more text due to the permissions. 10 additional characters per line of output. That is 30,000,000 more characters.
This is more data that has to go through the STDOUT redirect to info_file
and get written to disk. More data to push == longer time to write and complete.
In a situation with a single file or a small number of files, it would not be noticeable to a human; time
may give you some variation to measure but it might be too slight to notice.
In your question you are working with 3,000,000 files, so obviously it takes longer to write out the permissions output.
2nd question
I have no idea. Do you have a practical use case for needing to collect permissions for 3,000,000 files, or is this an academic exercise?
edited 2 hours ago
answered 2 hours ago
0xSheepdog0xSheepdog
1,72911024
1,72911024
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!
– Bahram
1 hour ago
1
... it would be easy to test whether this is the case, by replacing%M
with a fixed string like-rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
add a comment |
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!
– Bahram
1 hour ago
1
... it would be easy to test whether this is the case, by replacing%M
with a fixed string like-rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!– Bahram
1 hour ago
info_file
has size 94M after the first command and 125M after the second one. An extra 31M shouldn't cause the command to run 20 times slower!– Bahram
1 hour ago
1
1
... it would be easy to test whether this is the case, by replacing
%M
with a fixed string like -rw-rw-r--
– steeldriver
1 hour ago
... it would be easy to test whether this is the case, by replacing
%M
with a fixed string like -rw-rw-r--
– steeldriver
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
I don't think it's a matter of raw "disk space", I think it has to do with processing each line of output with an extra 10 characters. Depending on exactly what is coming out, that could mean an increase of 30% or more, per line.
– 0xSheepdog
1 hour ago
add a comment |
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
the same informations are available to readdir(3)
:
struct dirent {
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
};
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
12 mins ago
|
show 1 more comment
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
the same informations are available to readdir(3)
:
struct dirent {
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
};
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
12 mins ago
|
show 1 more comment
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
the same informations are available to readdir(3)
:
struct dirent {
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
};
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
The first version requires only to readdir(3)
/getdents(2)
the directory, when run on a filesystem supporting this feature (ext4: filetype
feature displayed with tune2fs -l /dev/xxx
, xfs: ftype=1
displayed with xfs_info /mount/point
...).
The second version in addition also requires to stat(2)
each file, requiring an additional inode lookup, and thus more seeks on the filesystem and device, possibly quite slower if it's a rotating disk and cache wasn't kept. This stat
is not required when looking only for name, inode and filetype because the directory entry is enough:
The linux_dirent structure is declared as follows:
struct linux_dirent {
unsigned long d_ino; /* Inode number */
unsigned long d_off; /* Offset to next linux_dirent */
unsigned short d_reclen; /* Length of this linux_dirent */
char d_name[]; /* Filename (null-terminated) */
/* length is actually (d_reclen - 2 -
offsetof(struct linux_dirent, d_name)) */
/*
char pad; // Zero padding byte
char d_type; // File type (only since Linux
// 2.6.4); offset is (d_reclen - 1)
*/
}
the same informations are available to readdir(3)
:
struct dirent {
ino_t d_ino; /* Inode number */
off_t d_off; /* Not an offset; see below */
unsigned short d_reclen; /* Length of this record */
unsigned char d_type; /* Type of file; not supported
by all filesystem types */
char d_name[256]; /* Null-terminated filename */
};
Suspected but confirmed by comparing (on a smaller sample...) the two outputs of:
strace -o v1 find many_files -printf '%i %y %pn'>info_file
strace -o v2 find many_files -printf '%i %y %M %pn'>info_file
Which on my Linux amd64 kernel 5.0.x just shows as main difference:
[...]
getdents(4, /* 0 entries */, 32768) = 0
close(4) = 0
fcntl(5, F_DUPFD_CLOEXEC, 0) = 4
-write(1, "25499894 d many_filesn25502410 f"..., 4096) = 4096
-write(1, "iles/844n25502253 f many_files/8"..., 4096) = 4096
-write(1, "096 f many_files/686n25502095 f "..., 4096) = 4096
-write(1, "es/529n25501938 f many_files/528"..., 4096) = 4096
-write(1, "1 f many_files/371n25501780 f ma"..., 4096) = 4096
-write(1, "/214n25497527 f many_files/213n2"..., 4096) = 4096
-brk(0x55b29a933000) = 0x55b29a933000
+newfstatat(5, "1000", {st_mode=S_IFREG|0644, st_size=5, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "999", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "998", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "997", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "996", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "995", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "994", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "993", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "992", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "991", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+newfstatat(5, "990", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
+newfstatat(5, "891", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
+write(1, "25499894 d drwxr-xr-x many_files"..., 4096) = 4096
+newfstatat(5, "890", {st_mode=S_IFREG|0644, st_size=4, ...}, AT_SYMLINK_NOFOLLOW) = 0
[...]
edited 22 mins ago
answered 53 mins ago
A.BA.B
5,98711030
5,98711030
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
12 mins ago
|
show 1 more comment
Unfortunately, thed_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).
– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibcglob(3)
that only triggered when thed_type
field was absent, I had to use either minixfs or use theGLOB_ALTDIRFUNC
.
– mosvy
12 mins ago
Unfortunately, the
d_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).– mosvy
35 mins ago
Unfortunately, the
d_type
field of a dir entry is a non-standard feature, only present on Linux and BSD, as mentioned in the readdir(3) manpage. (Though on Linux it is implemented on most filesystems that matter).– mosvy
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
@mosvy That's ok, the question is tagged CentOS. But yes I understand that on other *nix, results may differ
– A.B
35 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
Hum actually xfs (CentOS' default) support isn't quite clear...
– A.B
28 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
added how to check if the filetype feature is present on xfs, in case xfs is in use.
– A.B
18 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibc
glob(3)
that only triggered when the d_type
field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC
.– mosvy
12 mins ago
I think it's supported on xfs -- when I was making a testcase for a glibc
glob(3)
that only triggered when the d_type
field was absent, I had to use either minixfs or use the GLOB_ALTDIRFUNC
.– mosvy
12 mins ago
|
show 1 more comment
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512167%2fhuge-performance-difference-of-the-command-find-with-and-without-using-m-option%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The second question is the wrong question to ask. The real question is what you are doing with the output. If you are piping it somewhere for later processing of files based on the permissions, then you are probably doing it in a roundabout way. Instead you may want to use
-perm
withfind
to pick out the files with the permissions you're looking for.– Kusalananda♦
1 hour ago