Issue41354
Created on 2020-07-21 07:29 by chanke, last changed 2020-12-28 07:45 by chanke.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 21580 | open | chanke, 2020-07-21 09:26 | |
| Messages (3) | |||
|---|---|---|---|
| msg374054 - (view) | Author: Christof Hanke (chanke) * | Date: 2020-07-21 07:29 | |
help(filecmp.cmp) says:
"""
cmp(f1, f2, shallow=True)
Compare two files.
Arguments:
f1 -- First file name
f2 -- Second file name
shallow -- Just check stat signature (do not read the files).
defaults to True.
Return value:
True if the files are the same, False otherwise.
This function uses a cache for past comparisons and the results,
with cache entries invalidated if their stat information
changes. The cache may be cleared by calling clear_cache().
"""
However, looking at the code, the shallow-argument is taken only into account if the signatures are the same:
"""
s1 = _sig(os.stat(f1))
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
if shallow and s1 == s2:
return True
if s1[1] != s2[1]:
return False
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
outcome = _do_cmp(f1, f2)
if len(_cache) > 100: # limit the maximum size of the cache
clear_cache()
_cache[f1, f2, s1, s2] = outcome
return outcome
"""
Therefore, if I call cmp with shallow=True and the stat-signatures differ,
cmp actually does a "deep" compare.
This "deep" compare however does not check the stat-signatures.
Thus I propose follwing patch:
cmp always checks the "full" signature.
return True if shallow and above test passed.
It does not make sense to me that when doing a "deep" compare, that only the size
is compared, but not the mtime.
--- filecmp.py.orig 2020-07-16 12:00:57.000000000 +0200
+++ filecmp.py 2020-07-16 12:00:30.000000000 +0200
@@ -52,10 +52,10 @@
s2 = _sig(os.stat(f2))
if s1[0] != stat.S_IFREG or s2[0] != stat.S_IFREG:
return False
- if shallow and s1 == s2:
- return True
- if s1[1] != s2[1]:
+ if s1 != s2:
return False
+ if shallow:
+ return True
outcome = _cache.get((f1, f2, s1, s2))
if outcome is None:
|
|||
| msg383191 - (view) | Author: Scott (FreeSandwiches) | Date: 2020-12-16 17:46 | |
I suggest changing the documentation rather than the code. The mix up is in the wording. Documentation below "If shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared." The "Otherwise" appears to be referencing the "If shallow is true" cause, when it should be referring to the equality of the _sig()s. Proposed Change "If shallow is true, files with identical os.stat() signatures are taken to be equal. If they are not equal, the contents of the files are compared." |
|||
| msg383885 - (view) | Author: Christof Hanke (chanke) * | Date: 2020-12-28 07:45 | |
I understand that you are reluctant to change existing code. But for me as a sysadmin, the current behavior doesn't make sense for two reasons: * st.st_size is part of _sig. why would you do a deep compare if the two files have a different length ? * comparing thousands of files, a proper shallow-only compare is required, since it takes a long time to compare large files (especially when they are migrated to a tape-backend), so a silent-fallback to a deep-compare is not good. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2020-12-28 07:45:44 | chanke | set | messages: + msg383885 |
| 2020-12-16 17:46:24 | FreeSandwiches | set | nosy:
+ FreeSandwiches messages: + msg383191 |
| 2020-07-21 09:26:27 | chanke | set | keywords:
+ patch stage: patch review pull_requests: + pull_request20722 |
| 2020-07-21 07:29:03 | chanke | create | |