Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare folder A and subfolder B and show files that are in folder A but not in subfolder B #113736

Closed
me-suzy opened this issue Jan 5, 2024 · 6 comments
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error

Comments

@me-suzy
Copy link

me-suzy commented Jan 5, 2024

Bug report

Bug description:

I think Python itself has a BIG BUG. It's a folder and subfolder problem. If they were just different folders, it would have worked fine. But there is something wrong with the combination of find files from FOLDER vs SUBFOLDER.

It is a conflict between folder and subfolder. Compare folder A and subfolder B and show files that are in folder A but not in subfolder B. And when checking, it sees the different files from both folders, but when displaying it, it can't show me only the files that are in folder A, and which are not found in subfolder B

please see this topic:

https://stackoverflow.com/questions/77764496/compare-folder-a-and-subfolder-b-and-show-files-that-are-in-folder-a-but-not-in

I try many combinations,

Another version of the code here:

import os
folder1 = r"C:\Folder-Oana\extracted"
folder2 = r"C:\Folder-Oana\extracted\translated"
# Obține lista de fișiere HTML din fiecare folder
html_files_folder1 = [f.lower() for f in os.listdir(folder1) if f.lower().endswith('.html')]
print(html_files_folder1) # daca e corect, treci mai departe
html_files_folder2 = [f.lower() for f in os.listdir(folder2) if f.lower().endswith('.html')]
print(html_files_folder2) # daca e corect, treci mai departe
# Găsește diferențele între cele două liste de fișiere
missing_files = list(set(html_files_folder1) - set(html_files_folder2))
print(missing_files) # daca da corect diferenta mergi mai departe
# Afișează fișierele care lipsesc
if missing_files:
         print("Fișierele HTML care se găsesc în folderul 1, dar nu în folderul 2, sunt:")
         for filename in missing_files:
                         print(filename)
else:
         print("Nu există fișiere HTML care se găsesc în folderul 1, dar nu în folderul 2.")

CPython versions tested on:

3.8

Operating systems tested on:

Windows

@me-suzy me-suzy added the type-bug An unexpected behavior, bug, or error label Jan 5, 2024
@me-suzy
Copy link
Author

me-suzy commented Jan 5, 2024

Here is the structure fo folder, subfolder and files:

https://snipboard.io/jadcTP.jpg

@sobolevn
Copy link
Member

sobolevn commented Jan 5, 2024

Right now this does not look like a bug in CPython, but looks like a bug in your code :)

Please, simplify the reproducer, show what do you expect to happen, what happens and try the latest version of python (if you are confident that this is really a bug in CPython itself).

@sobolevn sobolevn added the pending The issue will be closed if no feedback is provided label Jan 5, 2024
@ericvsmith
Copy link
Member

And when simplifying the reproducer, make sure that we can run the code, without any additional setup. Your reproducer should create whatever minimum directories and files are needed to show the problem.

@me-suzy
Copy link
Author

me-suzy commented Jan 6, 2024

it is possible that this code works on other computers. I had some older versions of Python, but I uninstalled the old one and installed the new version Python 3.12.1

C:\Users\Castel>python --version
Python 3.12.1

However, there is something strange. All tested code versions, also made with ChatGPT, have a problem with FODLER vs SUBFOLDER comparison. That is, instead of finding only the html files in FOLDER A, the code also finds the html files in SUBFOLDER B.

If the error is from me, what should I do to make it work?

Another version:

import os
import filecmp

def compare_folders(folder1, folder2):
    dcmp = filecmp.dircmp(folder1, folder2)
    diff_files = dcmp.diff_files
    
    if diff_files:
        print(f"The following files are different between {folder1} and {folder2}:")
        for file in diff_files:
            print(f" - {file}")
    else:
        print(f"No differences found between {folder1} and {folder2}.")

if __name__ == "__main__":
    folder1 = r"C:\Folder-Oana\extracted"
    folder2 = r"C:\Folder-Oana\extracted\translated"
    compare_folders(folder1, folder2)

OR THIS ONE

import os
import filecmp

def compare_folders(folder1, folder2):
    differences = filecmp.dircmp(folder1, folder2).diff_files

    if differences:
        print(f"The following files are different between {folder1} and {folder2}:")
        print("\n".join([f" - {file}" for file in differences]))
    else:
        print(f"No differences found between {folder1} and {folder2}.")

if __name__ == "__main__":
    folder1 = r"C:\Folder-Oana\extracted"
    folder2 = r"C:\Folder-Oana\extracted\translated"
    compare_folders(folder1, folder2)

@me-suzy
Copy link
Author

me-suzy commented Jan 6, 2024

WORKS. The problem was with the version of Python. Thank you!

import os

def find_files_only_in_folder(folder, subfolder):
    folder_files = set(os.listdir(folder))
    subfolder_files = set(os.listdir(subfolder))

    files_only_in_folder = folder_files - subfolder_files

    print(f"Files only in {folder} and not in {subfolder}:")
    for file in files_only_in_folder:
        print(f"- {file}")

if __name__ == "__main__":
    folder_path = r"c:\Folder-Oana\extracted"
    subfolder_path = r"c:\Folder-Oana\extracted\translated"

    find_files_only_in_folder(folder_path, subfolder_path)

VERSION 2. This version compares files by both name and content. We check that the content of each shared file is different and show the files that exist in only one of the folders.

import filecmp
import os

def compare_folders_content(folder1, folder2):
    dcmp = filecmp.dircmp(folder1, folder2, shallow=False)
    
    for common_file in dcmp.common_files:
        file1_path = os.path.join(folder1, common_file)
        file2_path = os.path.join(folder2, common_file)
        
        with open(file1_path, 'rb') as file1, open(file2_path, 'rb') as file2:
            content1 = file1.read()
            content2 = file2.read()

        if content1 != content2:
            print(f"File {common_file} has different content.")
    
    for only_file in dcmp.left_only + dcmp.right_only:
        print(f"File {only_file} exists only in {folder1 if only_file in dcmp.left_only else folder2}.")

if __name__ == "__main__":
    folder1 = r"C:\Folder-Oana\extracted"
    folder2 = r"C:\Folder-Oana\extracted\translated"
    compare_folders_content(folder1, folder2)

@sobolevn sobolevn closed this as not planned Won't fix, can't repro, duplicate, stale Jan 6, 2024
@ericvsmith
Copy link
Member

In the future, please seek help at https://discuss.python.org/c/users/7 before opening an issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending The issue will be closed if no feedback is provided type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants