Python协程的原理与I/O多路复用

推荐一个prompt的网站 https://snackprompt.com/ ,下面会用这里面的prompt以获取更结构化的答案。

python协程程能做到高并发的底层原理

使用Get ChatGPT4 to efficiently teach you difficult / advanced technical concepts 这个prompt

Python’s concurrency, particularly through coroutines, is a powerful feature that allows high concurrency with minimal overhead. Here’s a breakdown of the underlying principles:

  1. Understanding Coroutines:

    1. Coroutines are generalizations of subroutines. Unlike subroutines, coroutines can pause execution and return control to the caller, enabling tasks to be executed concurrently.
    2. This is aided by the asyncio library in Python, which provides a framework for writing single-threaded concurrent code using coroutines.
  2. Event Loop:

    1. At the heart of coroutine-based concurrency is the event loop. The event loop runs the registered tasks (coroutines) in an organized manner. It uses I/O multiplexing to manage tasks without the need for multiple threads or processes.
  3. Non-blocking I/O Operations:

    1. Coroutines efficiently manage I/O-bound and high-level structured network code through non-blocking operations. When a coroutine performs an I/O operation, other tasks can run. When the operation completes, the routine is resumed.
  4. Task Scheduling:

    1. Instead of creating a new thread for each task, the event loop schedules multiple tasks to run on the same thread. This lightweight task management means less overhead compared to threads and processes.
  5. Yielding Control:

    1. await expressions in coroutines are used to yield control back to the event loop, allowing other tasks to run. This cooperative multitasking reduces the complexity of managing context switches, unlike preemptive multitasking.
  6. Memory Efficiency:

    1. Coroutines are much more memory efficient compared to threads because they do not require their own stack. They work in the same address space, avoiding the need for context switching that is typical of thread-based concurrency.
  7. Concurrency Patterns:

    1. Advanced patterns like producer-consumer, pipelines, and fan-out/fan-in can be implemented using coroutines to effectively handle high concurrency scenarios with low latency.

Understanding these principles helps you leverage Python’s concurrency through coroutines to achieve high throughput in applications with limited resources.

追问 I/O不被操作系统挂起的原因

“When a coroutine performs an I/O operation, other tasks can run. When the operation completes, the coroutine is resumed.”

假设这个包含协程的线程是A,其它线程比如一个UI的线程是B

它是怎么做到检测一个task在I/O然后切换到另一个task的,对于操作系统来说(比如linux)尽管它有很多task但是对于kernel来说就是只有一个线程。也就是说一个task在I/O的时候,内核会认为这个线程有I/O。这时候内核难道不是应该把这个包含协程的线程A挂起,切换其它的线程比如B进来执行吗?为什么会让这个协程继续执行后其他的task?

在Python的协程模型中,特别是使用asyncio库时,协程的执行并不依赖于操作系统的线程调度。相反,它依赖于一个事件循环(event loop)来管理和调度任务。这意味着即使在单线程环境下,也可以通过协程实现并发。

事件循环和非阻塞I/O

  1. 事件循环:事件循环是协程并发的核心。它负责调度和执行协程任务。当一个协程遇到await表达式(通常是一个I/O操作)时,它会将控制权返回给事件循环,而不是阻塞线程。
  2. 非阻塞I/O:协程中的I/O操作是非阻塞的。这意味着,当一个协程等待I/O操作完成时,它不会阻塞整个线程。相反,事件循环会继续运行其他已准备好的协程。

内核和线程的交互

在操作系统层面(例如Linux),线程在执行I/O操作时会被标记为“可阻塞”状态。然而,Python的**asyncio库通过使用非阻塞I/O(例如,使用selectpollepoll**等系统调用)来避免线程被阻塞。

  • 当协程执行一个非阻塞I/O操作时,事件循环会注册这个操作,并继续执行其他任务。
  • 一旦I/O操作完成,事件循环会收到通知,并恢复被挂起的协程。

协程与线程的区别

  • 协程:在用户级别管理,依赖于事件循环进行调度。它们共享同一个线程,不需要操作系统的上下文切换,因此开销较小。
  • 线程:由操作系统管理,涉及到内核级别的调度和上下文切换,开销较大。

线程A和线程B的关系

在你的例子中,线程A运行事件循环并调度协程任务,线程B可能运行其他任务(例如UI)。如果线程A中的协程在等待I/O,事件循环会继续调度其他协程,而不会被操作系统挂起。只有当线程A没有其他协程可以运行时,操作系统才可能将线程A挂起并调度其他线程(例如线程B)。

因此,通过这种方式,Python的协程能够在单线程环境中实现高效的并发,而不需要依赖多线程机制。

追问 select, poll, epoll的原理

使用 CompSciGPT computer science assistant 这个prompt

In the realm of operating systems and network programming, handling multiple simultaneous I/O operations is crucial. This is where system calls like select, poll, and epoll come into play, particularly in environments like server development. Let’s explore the differences and use cases for each:

Select

  • Overview: select is one of the oldest I/O multiplexing options available in Unix-based systems.

  • Functionality: It allows a program to monitor multiple file descriptors to see if any of them is ready for I/O (e.g., ready to read, write, or has an exceptional condition pending).

  • Limitations:

    • Scalability: It supports a limited number of file descriptors, typically 1024, though this is configurable in some systems.
    • Performance: Every call to select requires the kernel to walk through the entire list of file descriptors to check their states, which can become inefficient for a large number of file descriptors.
    • FD_SET Management: You must reset the file descriptor set for each call.

Poll

  • Overview: poll was introduced to overcome some of the select limitations.

  • Functionality: Similar to select, but it uses a different approach to handle file descriptors, allowing a greater number.

  • Advantages:

    • No Hard Limit: Unlike select, poll does not have a hardcoded limit on the number of file descriptors it can monitor.
    • Simplified Interface: Instead of managing multiple fd sets, poll uses a single array of pollfd structures.
  • Limitations:

    • Performance: Though better than select, it still requires iterating over the entire list of file descriptors, which can be computationally expensive as the number of descriptors grows.

Epoll

  • Overview: epoll is specific to Linux and designed to be more efficient and scalable than both select and poll.

  • Functionality: It allows management of a large number of file descriptors with better performance characteristics.

  • Advantages:

    • Scalability: Ideal for applications with a high number of file descriptors (thousands or more).
    • Efficiency: Uses a ready list instead of scanning all descriptors. The kernel tracks which file descriptors are “active,” which saves resources.
    • Edge/Level Triggering: Offers two modes (edge-triggered and level-triggered) giving developers finer control over the handling of ready events.
    • Persistent Interest: Once you register interest in certain events, epoll keeps the interest until explicitly changed or removed.
  • Limitations:

    • Complexity: Programming with epoll can be more complex, requiring a good understanding of its behavior, especially with edge-triggered semantics.

Choosing the Right System Call

  • Select: Rarely used in modern applications due to its limitations, but can be suitable for smaller, less demanding applications.
  • Poll: A reasonable choice when you need more capability than select offers, without needing the complexity of epoll.
  • Epoll: The preferred choice for high-performance server applications on Linux due to its scalability and efficient event notification handling.

Practical Considerations

  • When designing networked applications, especially servers, the choice between select, poll, and epoll can significantly impact performance.
  • Understand the workload and environment specifics to make an informed decision. For example, cross-platform applications might prefer poll for its broader compatibility.
  • Always consider the trade-offs between complexity and performance. While epoll offers great performance, it requires careful handling of edge cases in event management.

In conclusion, a solid understanding of select, poll, and epoll not only helps in choosing the right tool for your application needs but also aids in optimizing the application’s performance in a networked infrastructure.

select, poll, epoll代码

Certainly! Let’s modify the examples to demonstrate how to use select, poll, and epoll for file I/O operations. Since file operations are often simpler than networking operations, these examples will assume we’re using a non-blocking read operation on a file.

Common Setup

First, let’s ensure we have a file to work with:

1
2
3
4
5
6
7
8
9
10
11
12
13
#include <fcntl.h>  #include <unistd.h>  #include <stdio.h>  #include <string.h>  int setup_file(const char *filename) {  
int fd = open(filename, O_RDONLY | O_NONBLOCK);
if (fd == -1) {
perror("open");
}
return fd;
}

/**
When you open a file with the O_NONBLOCK flag,
it specifies that subsequent operations (like read or write) on the file descriptor should be non-blocking
if they support non-blocking behavior.
*/

This setup_file function will open a file in non-blocking mode, which is essential for non-blocking I/O operations.

Select Example with File I/O

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#include <sys/select.h>  
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main() {
int fd = setup_file("example.txt");
if (fd == -1) return 1;

fd_set readfds;
FD_ZERO(&readfds);
FD_SET(fd, &readfds);

struct timeval tv;
tv.tv_sec = 5; // Wait for up to 5 seconds
tv.tv_usec = 0;

int retval = select(fd + 1, &readfds, NULL, NULL, &tv);
if (retval == -1) {
perror("select");
} else if (retval) {
if (FD_ISSET(fd, &readfds)) {
printf("File is ready for reading.\n");
// Read from the file char buffer[1024];
ssize_t bytes_read = read(fd, buffer, sizeof(buffer));
if (bytes_read > 0) {
printf("Read %zd bytes: %.*s\n", bytes_read, (int)bytes_read, buffer);
}
}
} else {
printf("No data read within five seconds.\n");
}

close(fd);
return 0;
}

Poll Example with File I/O

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <poll.h>  
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

int main() {
int fd = setup_file("example.txt");
if (fd == -1) return 1;

struct pollfd fds[1];
fds[0].fd = fd;
fds[0].events = POLLIN;

int retval = poll(fds, 1, 5000); // Wait for 5 seconds

if (retval == -1) {
perror("poll");
} else if (retval > 0) {
if (fds[0].revents & POLLIN) {
printf("File is ready for reading.\n");
// Read from the file char buffer[1024];
ssize_t bytes_read = read(fd, buffer, sizeof(buffer));
if (bytes_read > 0) {
printf("Read %zd bytes: %.*s\n", bytes_read, (int)bytes_read, buffer);
}
}
} else {
printf("No data read within five seconds.\n");
}

close(fd);
return 0;
}

Epoll Example with File I/O

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#include <sys/epoll.h>  
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_EVENTS 5

int main() {
int fd = setup_file("example.txt");
if (fd == -1) return 1;

int epoll_fd = epoll_create1(0);
if (epoll_fd == -1) {
perror("epoll_create1");
exit(EXIT_FAILURE);
}

struct epoll_event event;
event.events = EPOLLIN;
event.data.fd = fd;
if (epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &event) == -1) {
perror("epoll_ctl");
exit(EXIT_FAILURE);
}

struct epoll_event events[MAX_EVENTS];
int n = epoll_wait(epoll_fd, events, MAX_EVENTS, 5000); // Wait for 5 seconds
if (n == -1) {
perror("epoll_wait");
exit(EXIT_FAILURE);
}

for (int i = 0; i < n; i++) {
if (events[i].data.fd == fd) {
printf("File is ready for reading.\n");
// Read from the file char buffer[1024];
ssize_t bytes_read = read(fd, buffer, sizeof(buffer));
if (bytes_read > 0) {
printf("Read %zd bytes: %.*s\n", bytes_read, (int)bytes_read, buffer);
}
}
}

close(epoll_fd);
close(fd);
return 0;
}

Explanation

  • File Descriptor: Each example utilizes setup_file to open a file named example.txt in non-blocking mode. You need to ensure this file exists for these examples to work.
  • Select, Poll, Epoll: These functions check if the file is ready for reading within a 5-second window.
  • Reading from the File: Once data is available, the programs proceed to read it using read().

These examples demonstrate how non-blocking I/O operations can be handled for files using select, poll, and epoll, analogous to how they would be used for sockets.

Does Select Map File Contents to Memory?

  • No Direct Mapping: The select system call itself does not map file contents to memory. Its purpose is strictly to monitor the readiness of file descriptors for I/O operations.
  • Content Access: It does not involve reading or writing file contents. Once select indicates that a file descriptor is ready, you still need to perform actual read or write operations on the file descriptor to access or modify content.

For regular files opened with non-blocking flags, select can be used to check readiness, although it is most useful for scenarios where I/O operations may block, such as with network sockets or pipes. Regular file systems often support ready operations differently than sockets do, making non-blocking I/O and readiness a key concern for network and inter-process communications.

Author

李三(cl0und)

Posted on

2024-08-21

Updated on

2024-09-14

Licensed under