python+str+count
作者:siseniao
从该文章改进:https://post.smzdm.com/p/an370emp/?zdm_ss=Android_1106136211_&send_by=1106136211&from=other&invite_code=zdmwffzv7winv
原文每次扫描都需要重新计算MD5,对于大文件来说,磁盘消耗较大,增加了缓存文件存储md5,每次扫描只计算新文件,提高效率。
不废话,直接贴代码:
import os
import hashlib
# 只删除以下列表中的重复文件类型.如果想删除其他类型的文件,自己添加一下就行了
file_type = ['.jpg', '.jpeg', '.png', '.gif', '.psd', '.bmp', '.webp', '.mp4', '.mkv', '.avi', '.mov', 'mpeg', 'mpg',
'.rar', '.zip']
check_files = []
#自行修改目录列表
work_dir_list = [r'/volume2/111', r'/volume1/222']
def save_md5_file(files_dict:dict):
if files_dict is None:
return
try:
with open("md5.txt", "w") as f:
for path_md5, file_md5, in files_dict.items():
f.write(str(path_md5) + "=" + str(path_md5) + 'n')
except Exception as e:
pass
def open_md5_file():
files_md5 = {}
try:
with open("md5.txt", "r") as f:
for md5_line in iter(lambda: f.readline(), ""):
list_keys = md5_line.split('=')
if len(list_keys) == 2:
files_md5[list_keys[0].strip()] = list_keys[1].strip()
except Exception as e:
pass
return files_md5
def remove_repeat_files():
for work_dir in work_dir_list:
for root, dirs, files in os.walk(work_dir):
for name in files:
p_type = os.path.splitext(os.path.join(root, name))[1]
if p_type in file_type:
check_files.append(os.path.join(root, name))
for name in dirs:
p_type = os.path.splitext(os.path.join(root, name))[1]
if p_type in file_type:
check_files.append(os.path.join(root, name))
files_dict = {}
files_md5 = open_md5_file()
r_index = 0
print('Files Num:%s' % len(check_files))
for file_path in check_files:
try:
md5_path = hashlib.md5()
md5_path.update(file_path.encode('utf-8'))
path_md5 = md5_path.hexdigest()
file_md5 = files_md5.get(path_md5)
if file_md5 is None:
md5_hash = hashlib.md5()
with open(file_path, "rb+") as f:
for byte_block in iter(lambda: f.read(4096), b""):
md5_hash.update(byte_block)
file_md5 = md5_hash.hexdigest()
print('Check file MD5:%s' % file_path)
files_md5[path_md5] = file_md5
if files_dict.get(file_md5) is None:
files_dict[file_md5] = file_path
else:
d_path = files_dict[file_md5]
d_path_stats = os.stat(d_path)
file_stats = os.stat(file_path)
d_time = d_path_stats.st_ctime
f_time = file_stats.st_ctime
if d_time > f_time:
os.remove(d_path)
files_dict[file_md5] = file_path
print('Delete File:', d_path)
r_index += 1
else:
os.remove(file_path)
print('Delete File:', file_path)
r_index += 1
except Exception as e:
pass
print('File Count:%s, Repeat Files Num:%s. All deleted!' %( len(check_files),str(r_index)))
save_md5_file(files_md5)
if __name__ == '__main__':
remove_repeat_files()
可以在ssh或者任务计划里执行
","gnid":"9a931522e9730c14b","img_data":[{"flag":2,"img":[{"desc":"","height":"385","title":"","url":"https://p0.ssl.img.360kuai.com/t01a2508b2adc68479c.jpg","width":"600"}]}],"original":0,"pat":"art_src_1,fts0,sts0","powerby":"hbase","pub_time":1679316661000,"pure":"","rawurl":"http://zm.news.so.com/1715a84bea2900132874605fea6f9a81","redirect":0,"rptid":"71260418b6e0ce01","rss_ext":[],"s":"t","src":"什么值得买","tag":[],"title":"利用python删除群晖重复文件(缓存文件MD5方式)
乐命视1605如何实现C与python混合编程 -
殳万炒15035928278 ______ 实现C与python混合编程方法 代码如下:/* tcpportping.c */#include <Python.h>#include <string.h>#include <sys/types.h>#include <sys/socket.h>#include <netinet/in.h>#include <netdb.h>#include <sys/time.h>/* count time functions */ static double ...
乐命视1605python中调用str()需要提前声明什么吗 -
殳万炒15035928278 ______ 不用吧. str是string类型,类似int.str()是把其他类型转换成string类型而已.相同的还有repr()函数. 两者都是内建函数,不需要import其他模块.
乐命视1605c++中,strcpy()和strcat(),str+2又是什么意思? -
殳万炒15035928278 ______ 答案选C. strcpy()是复制字符串. strcat()是把两个字符串连接起来. str+2,这个是把数组当成指针用.str指向的是数组里的第一个元素,str+2指向数组里第三个元素,也就是字母z.C/C++里的数组跟指针很相似的.如果不明白就去翻翻书吧. strcpy(str+2,strcat(p1,p2)); 这整句的意思就是:先把p1和p2连起来,得到abcABC,然后把这个字符串复制到str+2所指向的位置.所以结果是xyabcABC.
乐命视1605求大神将一段python 语言改写成 C++语言急急急 -
殳万炒15035928278 ______ class Solution(object): def findKthNumber(self, n, k): L = len(str(n)) onezeros = pow(10, L - 1) weight = int(''.join(['1'] * L)) - 1 path = 0 index = 0 while True: if index == k: return path index += 1 for child in range(10): if path == 0 and child == 0: ...
乐命视1605python str是什么编码 -
殳万炒15035928278 ______ str 和 unicode str和unicode都是basestring的子类 所以有判断是否是字符串的方法 def is_str(s): return isinstance(s, basestring) str和unicode 转换 decode 文档 encode 文档 str -> decode('the_coding_of_str') -> unicode unicode -> encode('...
乐命视1605C语言关于strcpy的重叠 -
殳万炒15035928278 ______ 最简单的例子是自我复制,如 char s[] = "123"; strcpy(s,s); // “源”与“目标”重叠.二是部分重叠,如 char s[20] = "123456789"; char *t = s + 6; strcpy(s,t); // 执行完毕后,printf("%s\n",s);的结果是“789”
乐命视1605Python2和3中关于str和unicode以及UTF - 8的更改到底是什么意思 -
殳万炒15035928278 ______ Python2.x中:str格式本质含义是“某种编码格式”,绝大多数情况下,被引号框起来的字符串,就是str,这时的字符串编码类型,其实就是你Python文件的编码类型,比如在Windows里,默认用的是GBK编码.Unicode格式的含义就是“用...
乐命视1605python 编写简单方程 -
殳万炒15035928278 ______ 函数名不能有# 所以: #!python3 import re def extract(s): return [i[1:] for i in re.findall(r'#\w+',s)] print(extract('ABC #123ab! #abc')) print(extract('ABC #123ab! #123ab! #abc'))运行结果: [willie@bogon ~]$ python3 Python 3.5.2 (default, Sep 30 ...
乐命视1605如何让python调用C和C++代码 -
殳万炒15035928278 ______ 要搞明白如何让python调用C/C++代码(也就是写python的extension),你需要征服手册中的<<Extending && embedding>>厚厚的一章.在昨天花了一个小时看地头晕脑胀,仍然不知道如何写python的extension后,查阅了一些其他书籍,最终...
乐命视1605哪位大神能把这段Python代码转为C语言代码 -
殳万炒15035928278 ______ ipython作为一个非常有用的python shell,在linux下安装非常方便,但copy是在win下安装还有点麻烦,下面就是详细的安装步骤. 工具/原料 python ez_setup.py pyreadline ipython 方法/步骤 1 python安装完成以后,然后zhidao需要安装ez_setup.py