python中text的用法

首页 >> 正文

python中text的用法

来源：baiyundou.net 日期：2024-09-23

【CSDN 编者按】IBM工程师Martin Heinz发文表示，Python即将迎来了真正的多线程时刻！

原文：https://martinheinz.dev/blog/97

未经授权，禁止转载！

作者 | Martin Heinz 责编 | 梦依丹

翻译工具 | ChatGPT

32岁的Python依然没有真正的并行性/并发性。然而，这种情况即将发生改变，因为在即将发布的Python 3.12中引入了名为"Per-Interpreter GIL"（全局解释器锁）的新特性。在距离发布还有几个月的时间（预计2023年10月发布），但相关代码已经有了，因此，我们可以提前了解如何使用子解释器API编写真正的并发Python代码。

子解释器

首先，让我们来解释一下如何通过“Per-Interpreter GIL”来解决Python缺乏适当的并发性问题。

在Python中，GIL是一个互斥锁，它只允许一个线程控制Python解释器。这意味着即使在Python中创建多个线程（例如使用线程模块），也只有一个线程会运行。

随着“Per-Interpreter GIL”的引入，各个Python解释器不再共享同一个GIL。这种隔离级别允许每个子解释器可以同时运行。这意味着我们可以通过生成额外的子解释器来绕过Python的并发限制，其中每个子解释器都有自己的GIL（全局状态）。

更详细的说明请参见PEP 684，该文档描述了此功能/更改：https://peps.python.org/pep-0684/#per-interpreter-state

上手体验

安装

为使用这项最新功能，我们必须安装最新版的Python，并且需要从源码上进行构建：

# https://devguide.python.org/getting-started/setup-building/#unix-compilinggit clone https://github.com/python/cpython.gitcd cpython./configure --enable-optimizations --prefix=$(pwd)/python-3.12make -s -j2./python# Python 3.12.0a7+ (heads/main:22f3425c3d, May 10 2023, 12:52:07) [GCC 11.3.0] on linux# Type "help", "copyright", "credits" or "license" for more information.

C-API在哪里？

既然已经安装了最新的版本，那我们该如何使用子解释器呢？可以直接导入吗？不，正如PEP-684中所提到的：“这是一个高级功能，专为C-API的一小部分用户设计。”

目前，Per-Interpreter GIL特性只能通过C-API使用，因此Python开发人员没有直接的接口可以使用。这样的接口预计将随着PEP 554一起推出，如果被采纳，则应该会在Python 3.13中实现，在那之前，我们必须自己想办法实现子解释器。

通过CPython代码库中的一些零散记录，我们可以采用下面两种方法：

使用_xxsubinterpreters模块，该模块是用C实现的，因此名称看起来有些奇怪。由于它是用C实现的，所以开发者无法轻易地检查代码（至少不是在 Python 中）；

或者可以利用CPython的测试模块，该模块具有用于测试的示例 Interpreter（和 Channel）类。

# Choose one of these:import _xxsubinterpreters as interpretersfrom test.support import interpreters

在接下来的演示中，我们将主要采用第二种方法。我们已经找到了子解释器，但还需要从Python的测试模块中借用一些辅助函数，以便将代码传递给子解释器：

from textwrap import dedentimport os# https://github.com/python/cpython/blob/# 15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test__xxsubinterpreters.py#L75def _captured_script(script): r, w = os.pipe() indented = script.replace('\\n', '\\n ') wrapped = dedent(f""" import contextlib with open({w}, 'w', encoding="utf-8") as spipe: with contextlib.redirect_stdout(spipe): {indented} """) return wrapped, open(r, encoding="utf-8")def _run_output(interp, request, channels=None): script, rpipe = _captured_script(request) with rpipe: interp.run(script, channels=channels) return rpipe.read()

将interpreters模块与上述的辅助程序组合在一起，便可以生成第一个子解释器：

from test.support import interpretersmain = interpreters.get_main()print(f"Main interpreter ID: {main}")# Main interpreter ID: Interpreter(id=0, isolated=None)interp = interpreters.create()print(f"Sub-interpreter: {interp}")# Sub-interpreter: Interpreter(id=1, isolated=True)# https://github.com/python/cpython/blob/# 15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test__xxsubinterpreters.py#L236code = dedent(""" from test.support import interpreters cur = interpreters.get_current() print(cur.id) """)out = _run_output(interp, code)print(f"All Interpreters: {interpreters.list_all()}")# All Interpreters: [Interpreter(id=0, isolated=None), Interpreter(id=1, isolated=None)]print(f"Output: {out}") # Result of 'print(cur.id)'# Output: 1

生成和运行新解释器的一个方法是使用create函数，然后将解释器与要执行的代码一起传递给_run_output辅助函数。

更简单点的方法是：

interp = interpreters.create()interp.run(code)

使用解释器中的run方法。可是，如果我们运行上述代码中的任意一个，都会得到如下错误：

Fatal Python error: PyInterpreterState_Delete: remaining subinterpretersPython runtime state: finalizing (tstate=0x000055b5926bf398)

为避免此类错误发生，还需要清理一些悬挂的解释器：

def cleanup_interpreters(): for i in interpreters.list_all(): if i.id == 0: # main continue try: print(f"Cleaning up interpreter: {i}") i.close() except RuntimeError: pass # already destroyedcleanup_interpreters()# Cleaning up interpreter: Interpreter(id=1, isolated=None)# Cleaning up interpreter: Interpreter(id=2, isolated=None)

线程

虽然使用上述辅助函数运行代码是可行的，但使用线程模块中熟悉的接口可能更加方便：

import threadingdef run_in_thread(): t = threading.Thread(target=interpreters.create) print(t) t.start() print(t) t.join() print(t)run_in_thread()run_in_thread()# # # # # #

我们通过把interpreters.create函数传递给Thread,它会自动在线程内部生成新的子解释器。

我们也可以结合这两种方法，并将辅助函数传递给threading.Thread：

import timedef run_in_thread(): interp = interpreters.create(isolated=True) t = threading.Thread(target=_run_output, args=(interp, dedent(""" import _xxsubinterpreters as _interpreters cur = _interpreters.get_current() import time time.sleep(2) # Can't print from here, won't bubble-up to main interpreter assert isinstance(cur, _interpreters.InterpreterID) """))) print(f"Created Thread: {t}") t.start() return tt1 = run_in_thread()print(f"First running Thread: {t1}")t2 = run_in_thread()print(f"Second running Thread: {t2}")time.sleep(4) # Need to sleep to give Threads time to completecleanup_interpreters()

这里，我们演示了如何使用_xxsubinterpreters模块而不是test.support中的模块。我们还在每个线程中睡眠2秒钟来模拟一些“工作”。请注意，我们甚至不必调用join()函数等待线程完成，只需在线程完成时清理解释器即可。

Channels

如果我们深入研究CPython测试模块，我们还会发现有 RecvChannel 和 SendChannel 类的实现，它们类似于 Golang 中的通道（Channel）。要使用它们：

# https://github.com/python/cpython/blob/# 15665d896bae9c3d8b60bd7210ac1b7dc533b093/Lib/test/test_interpreters.py#L583r, s = interpreters.create_channel()print(f"Channel: {r}, {s}")# Channel: RecvChannel(id=0), SendChannel(id=0)orig = b'spam's.send_nowait(orig)obj = r.recv()print(f"Received: {obj}")# Received: b'spam'cleanup_interpreters()# Need clean up, otherwise:# free(): invalid pointer# Aborted (core dumped)

这个例子展示了如何创建一个带有receiver（r）和sender（s）端的通道。我们可以使用send_nowait将数据传递给发送方，并使用recv函数在另一侧读取它。这个通道实际上只是另一个子解释器 - 所以与之前一样 - 我们需要在完成后进行清理。

深入挖掘

最后，如果我们想要干扰或调整在C代码中设置的子解释器选项，那么可以使用test.support模块中的代码，具体来说就是run_in_subinterp_with_config：

import test.supportdef run_in_thread(script): test.support.run_in_subinterp_with_config( script, use_main_obmalloc=True, allow_fork=True, allow_exec=True, allow_threads=True, allow_daemon_threads=False, check_multi_interp_extensions=False, own_gil=True, )code = dedent(f""" from test.support import interpreters cur = interpreters.get_current() print(cur) """)run_in_thread(code)# Interpreter(id=7, isolated=None)run_in_thread(code)# Interpreter(id=8, isolated=None)

这个函数是一个Python API，用于调用C函数。它提供了一些子解释器选项，如own_gil，指定子解释器是否应该拥有自己的GIL。

总结

话虽如此——也正如你所看到的，API调用并不简单，除非你已具备C语言专业知识，并且又迫切想要使用字解释器，否则建议还是等待Python 3.13的发布。或者您可以尝试extrainterpreters项目，该项目提供更友好的Python API以便使用子解释器。

","gnid":"9956ec435c7bfa16a","img_data":[{"flag":2,"img":[{"desc":"","height":"80","s_url":"https://p0.ssl.img.360kuai.com/t013d73ffee4a20366b_1.gif","title":"","url":"https://p0.ssl.img.360kuai.com/t013d73ffee4a20366b.gif","width":"640"}]}],"original":0,"pat":"art_src_1,fts0,sts0","powerby":"hbase","pub_time":1684491729000,"pure":"","rawurl":"http://zm.news.so.com/80752638082f9b466a7e6369cfcb10da","redirect":0,"rptid":"d162cb29895325f3","rss_ext":[],"s":"t","src":"CSDN","tag":[],"title":"真正的Python多线程来了！

梅乐佩1679这句话想截取text里面的内容,在python里的正则表达式应该怎么写? -
浦仲秒17792728233 ______ import re s = 'face - question:检测贷款金额result:{＂rc＂:0,＂text＂:＂检测贷款金额＂,＂service＂:＂cn' re.findall(r'＂text＂:＂(.*?)＂', s) # ['检测贷款金额'] # 以上代码可以运行,实际例子要看具体情况进行调整

梅乐佩1679python中如何将Zip文件里的text文件用CSV的方式读取 -
浦仲秒17792728233 ______ 其实最后输出的二进制, 和zipfile无关, 是和py3.5有关, 你可以在输出的结果解码, 就能得到字符类型了 content = f.readline() print(content.decode('utf8'))

梅乐佩1679python自然语言处理中set是干什么用的 -
浦仲秒17792728233 ______ text1是字符串,set(text1)把字符串按照字符分成不重复的集合 len(set(text1))可以统计该集合的长度也就是说能得到组成text1的字符的个数

梅乐佩1679如何用python读取文本中指定行的内容 -
浦仲秒17792728233 ______ 1.默认你知道“指定行”的行号那么:def appoint_line(num,file): with open(file,＂r＂,encoding='utf-8') as f: out = f.readlines[num-1] return out print(appoint_line(2,＂c:/text.txt＂)) 以上示例为读取c盘下的text.txt文件的第二行2.假如所谓“指定行”为...

梅乐佩1679用python写一段程序 ,找文章中有多少个这个词 -
浦仲秒17792728233 ______ 我不知道你究竟是要匹配a. 还是要把这种情况除外.===========假设text字符串就是你的那个txt里面的内容了===================省略了上面的 open啊readline等等=======t= import re c1=re.findall(r'\ba\b',text) #包含a. #注意, “ a.a ”会被匹配两次,因为“.”和“ ”是一样的 c2=re.findall(r'\ba\s',text) #不包含a.an,the就换成r'\ban\b'和r'\bthe\b'#最后数c1的元素个数就是匹配的个数 len(c1)

梅乐佩1679python 中的两种三引号 -
浦仲秒17792728233 ______ 1>''' 内容 ''' #表示注释2> path=''' 内容''' #表示变量内容

梅乐佩1679python中quote函数是什么意思,怎么用 -
浦仲秒17792728233 ______ urllib 库中的 quote? 在 Python2.x 中的用法是:urllib.quote(text) Python3.x 中是 urllib.parse.quote(text) 按照标准, URL 只允许一部分 ASCII 字符(数字字母和部分符号),其他的字符(如汉字)是不符合 URL 标准的.所以 URL 中使用...

梅乐佩1679Python读取Excel中的中文单元格,前面有个text:怎么去掉? -
浦仲秒17792728233 ______ 这个表示的是它内容的属性,加入你print的这个对象叫cell1 那么直接取中间的元素用,cell1.value就可以了..

梅乐佩1679python怎么提取txt中的词语输出? -
浦仲秒17792728233 ______ 没太看明白你的题目,我假设你要从{＂course_id＂: 14, ＂text＂:“我要学习python.”}这个字典中输出: 我要学习 python -------------------------------------- dic = {＂course_id＂: 14, ＂text＂:＂我要学习python.＂} list = [] length= len(dic[＂text＂]) ...

（编辑：自媒体）