Overall these are the points:
- Praying, reading documentation, reading comments, and looking at issues as the primary means of understanding code
- Reading code and debugging as an aid to understanding code
- Debug as the primary means of verifying understanding
- Daily organization of the day's understanding
- If you don't understand, leave an impression and skip ahead.
Why pray?
Because authors don't necessarily care too much about documentation and annotations: most have correct implementation as their main goal. So keep your fingers crossed that the documentation and annotations are correct, or mostly correct.
Why don't you consider the absence of documentation and comments? The reason is simple: if the project is not well known, the author doesn't bother to maintain it, so it's not unusual for these things to be absent or imperfect.
So, pick the famous one.
Why read the documentation?
If the documentation is of sufficiently high quality, an overview of each part of the code, or at least the key parts, will be documented, and will also be associated with the names of variables in the code.
Such as redis backlog, in fact, you can directly through grep in the code to find the corresponding variables, to this point you can directly speculate on which functions are operating backlog, which can help you understand the code.
Why read issue?
The issue is the user making the request to the author, so it at least explains why the feature is there.
But anyone who can mention issue should be a programmer, so both authors and users will discuss the reasons, design details of this feature or bug.
For example, redis uses epoll, which is epoll_create instead of epoll_create1, and the reason for this is compatibility with older kernel versions.
Why organize?
The reason is simple: code is complex and people tend to forget things.
Let's not talk about reading code, let's talk about writing code. You write a function today, and then two weeks later you look at it and you're probably thinking, "Who wrote this shit?".
Don't trust the draft paper in your hand to keep a clear record of it all, because first of all pen and paper doesn't make it easy to revise what you've written down, and you might have to write a whole bunch of stuff into a very small piece of space, and second of all the draft paper is there to help you make sense of a small piece of stuff, so it's going to be messy, so the next day comes around, and you never want to look at it again.
I used to be a big fan of using draft paper, with the result that although I wrote a lot, I pretty much started from scratch the next day. Just look at how much draft paper I used, with the pen in it as a reference.
How to organize it? First, using electronics, draw over the execution flowchart, and next to the flowchart mark at least which function this is doing what, even if you can be precise about which line. Then make a note of what problem you worked on today and what you will work on next. If you have a web page, remember to post it as well.
Remember to save a copy of the breakpoints as well, if they are still used.
For what is not understood
Because of the lack of understanding of the entire project when reading the code, there are some hard to understand parts that come up, so consider skipping them.
How do you leave an impression of what you don't understand?
First, if you can, read the comments, or check the documentation using keywords, to get a general idea of what this is for. For example, when redis is reusing rdb, the comments say that this is a replication buffer. But you don't know what this buffer is or what it does. After checking it out and manipulating the backlog, I learned that this is the location of the history of the replication command.
/* Perfect, the server is already registering differences for
* another slave. Set the right state, and copy the buffer.
* We don't copy buffer if clients don't want. */
if (!(c->flags & CLIENT_REPL_RDBONLY))
copyReplicaOutputBuffer(c,slave);
But what if you still don't understand it? There's nothing you can do but organize where in which document you don't understand.
Understanding the Basics
When reading the code, you have to understand at least the most basic parts of the whole program, like the thread model. Then you also have to find things that are obviously very generic, like the encapsulation of TCP connections. If you don't read the thread model, then you don't know when a process is done and its next steps.
For example a PING/PONG that receives a PING and then sets up a write callback to send the PONG. without understanding the threading model, this is likely to be ignored.
Of course there are things to keep in mind like timers, make sure you know the author's design. I personally don't recommend the redis timer implementation because it has to do too many things! A serverCron inside do a lot of things, both to manage the cluster, but also master-slave replication, but also to update their own state, but also responsible for reconnection!
Of course I haven't figured out what kind of timer can be both easy to write and easy to read code for the time being.