glibc-malloc源码阅读二

2022-03-30 约 8638 字预计阅读 18 分钟

在上一篇中，存留了一些疑问:

tcache
fastbin
smallbin
unsorted_bin
以上如何管理？
以上什么关系？
bin是怎么初始化的？
bin是怎么扩充的，可以扩充吗？

bin

malloc_chunk

首先要清楚bin中的元素是什么, bin中的元素是memory chunk, 其定义如下:

1
2
3
4
5
6
7
8
9
struct malloc_chunk {
  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).  */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead. */
  struct malloc_chunk* fd;         /* double links -- used only if free. */
  struct malloc_chunk* bk;
  /* Only used for large blocks: pointer to next larger size.  */
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

mchunk_prev_size代表上一个chunk的大小
mchunk_size 代表当前chunk的大小
fd/bk双向链表的前后指针
fd_nextsize/bk_nextsize针对large chunk, 双向链表的前后指针

通过chunk的结构我们可以知道, chunk是通过链表链接起来的, 至于是单向链表还是双向链表就需要再看实现了.

注意到注释部分, 有些成员会标注是"if free", 表示这一项在chunk是free的时候才有效, 比如malloc之后, 这一项就是无效的, 这是用来复用的. malloc返回给用户的内存可以访问到这些区域, 比如mchunk_prev_size/fd_nextsize等, 但是诸如mchunk_size这块内存是不允许用户访问的(用户要是访问了, 修改了, 那glibc就无法正常管理了).

chunk结构

malloc.c中对于此内存结构有如下表示:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of previous chunk, if unallocated (P clear)  |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of chunk, in bytes                     |A|M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             User data starts here...                          .
            .                                                               .
            .             (malloc_usable_size() bytes)                      .
            .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             (size of chunk, but used for application data)    |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |             Size of next chunk, in bytes                |A|0|1|
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

通过这个图, 又可以了解到一些信息:

当前chunk加上mchunk_size则可以得到下一个chunk
当前chunk减去mchunk_prev_size则可以得到上一个chunk
mchunk_prev_size是可能被上一个chunk返回给用户的
fd及一下等, 是可能被当前chunk返回给用户的

A|M|P是什么:

A代表NON_MAIN_ARENA, 表示chunk的arena是不是main_arena, 如果是main_arena, 则A = 0, 否则A = 1
M代表IS_MMAPPED, 表示当前chunk是不是通过mmap申请的
P代表PREV_INUSE, 表示当前chunk的上一个chunk是不是被使用的, 如果是, 则上一个被使用, mchunk_prev_size也无效, 否则mchunk_prev_size代表上一个chunk的size, 第一个chunk节点的P总是1

等等!!上文说的上一个chunk是指? 链表的上一个吗? 可以参考这篇, 所以我理解的上一个chunk是指连续内存上的上一个chunk, 因为bin中的chunk可能不在相邻内存. 如图示:

https://bu.dusays.com/2022/06/26/62b87a94bbe46.png — bin-chunk

malloc_state

再看看bin是什么, 在上一篇的内容中, 我们有提到一个参数av, bin被存放在av中, av是一个malloc_state指针, malloc_state的内容如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
struct malloc_state
{
  /* Serialize access.  */
  __libc_lock_define (, mutex);
  /* Flags (formerly in max_fast).  */
  int flags;
  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;
  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];
  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;
  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;
  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];
  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];
  /* Linked list */
  struct malloc_state *next;
  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;
  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;
  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

类型mfastbinptr和mchunkptr都是malloc_chunk的指针, 在malloc_state中, 我们可以看到有fastbin和bin, 以及表示fastbin是否存在的have_fastchunks, 但是没有看到smallbin/largebin这些结构. 实际上, smallbin/largebin是包含在bin中的. 我的理解是, smallbin和largebin其实可以看作是同一种bin, 所以归类在一起, 但是fastbin和其他bin会有一些区别, 所以不算在普通bin类.

另外还注意到next和next_free两个成员, 由此也可以推断出malloc_state是某个链表的元素, 并且会有类似free list的存在.

这里可以看到main_arena的初始化, 它的next指向了自己, 是一个环状链表的结构:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/* There are several instances of this struct ("arenas") in this
   malloc.  If you are adapting this malloc in a way that does NOT use
   a static or mmapped malloc_state, you MUST explicitly zero-fill it
   before using. This malloc relies on the property that malloc_state
   is initialized to all zeroes (as is true of C statics).  */
static struct malloc_state main_arena =
{
  .mutex = _LIBC_LOCK_INITIALIZER,
  .next = &main_arena,
  .attached_threads = 1
};

arena的扩展

arena如何扩展? 在malloc里面我们已经见过arena_get函数, 如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#define arena_get(ptr, size) do { \
      ptr = thread_arena;                                                      \
      arena_lock (ptr, size);                                                      \
  } while (0)

#define arena_lock(ptr, size) do {                                              \
      if (ptr)                                                                      \
        __libc_lock_lock (ptr->mutex);                                              \
      else                                                                      \
        ptr = arena_get2 ((size), NULL);                                      \
  } while (0)

首先是获取当前线程的thread_arena, 如果没有, 那么尝试arena_get2. thread_arena是什么? 在ptmalloc_init填充了thread_arena, 指向了main_arena:

1
2
//ptmalloc_init (void)
thread_arena = &main_arena;

但是我们还需要注意, __malloc_hook只有在第一次调用malloc的时候才会被调用, 此后__malloc_hook会被赋空, 也就是说对应的ptmalloc_init只会被调用一次, 只有在进程启动的时候, 第一个线程的thread_arena才会被赋值为&main_arena, 而其他线程此值为空.

在malloc的时候, 还有这么一段:

1
2
3
4
5
6
7
if (SINGLE_THREAD_P)
{
  victim = _int_malloc (&main_arena, bytes);
  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          &main_arena == arena_for_chunk (mem2chunk (victim)));
  return victim;
}

这说明, 在单线程环境的时候, 直接使用main_arena就行了, 多线程环境, 不同的线程则可能使用不同的arena, 这是为了加速, 减少互斥锁而使用的.

再继续arena_get2, 它做的第一件事就是尝试从一个free的链表中获取, 如果获取到了, 那就返回了, 如果没获取到, 则尝试创建一个新的arena:

1
2
3
4
5
6
7
8
9
a = get_free_list ();
if (a == NULL)
{
  //......
  a = _int_new_arena (size);
  //......
  a = reused_arena (avoid_arena);
  //......
}

当然, 是不是可以无限制的创建arena? 不是的, 会和当前系统有关. _int_new_arena的作用就本就是mmap一块heap, 然后改造成arena结构, 再将thread_arena指向这块新arena, 原先的arena则会被detach掉(detach的作用就是当前arena的引用计数器attached_threads减一), 但是目前还没有回收到free表中:

1
2
3
4
5
6
mstate replaced_arena = thread_arena;
thread_arena = a;
//......
__libc_lock_lock (free_list_lock);
detach_arena (replaced_arena);
__libc_lock_unlock (free_list_lock);

新arena怎么和arena链表关联起来呢? 如下, 新arena总是插在链表的"颈部", 而链表"尾部"的元素总是指向"头部"的main_arena. 所以, 这类似于一个FIFO的队列:

1
2
3
4
5
6
7
/* Add the new arena to the global list.  */
a->next = main_arena.next;
/* FIXME: The barrier is an attempt to synchronize with read access
    in reused_arena, which does not acquire list_lock while
    traversing the list.  */
atomic_write_barrier ();
main_arena.next = a;

arena的复用

或者, 如果没办法创建新的arena了(和系统有关), 那么就会考虑复用, 调用reused_arena, 大致流程是:

获取arena链表头部节点:

1
2
3
4
/* FIXME: Access to next_to_use suffers from data races.  */
static mstate next_to_use;
if (next_to_use == NULL)
  next_to_use = &main_arena;

移动到尾部节点:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
/* Iterate over all arenas (including those linked from
    free_list).  */
result = next_to_use;
do
  {
    if (!__libc_lock_trylock (result->mutex))
      goto out;
    /* FIXME: This is a data race, see _int_new_arena.  */
    result = result->next;
  }
while (result != next_to_use);

将尾部节点从free list中移除, 如果free list中有这个元素的话. 然后在对对应的arena计数器加一:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
/* Attach the arena to the current thread.  */
{
  /* Update the arena thread attachment counters.   */
  mstate replaced_arena = thread_arena;
  __libc_lock_lock (free_list_lock);
  detach_arena (replaced_arena);
  /* We may have picked up an arena on the free list.  We need to
      preserve the invariant that no arena on the free list has a
      positive attached_threads counter (otherwise,
      arena_thread_freeres cannot use the counter to determine if the
      arena needs to be put on the free list).  We unconditionally
      remove the selected arena from the free list.  The caller of
      reused_arena checked the free list and observed it to be empty,
      so the list is very short.  */
  remove_from_free_list (result);
  ++result->attached_threads;
  __libc_lock_unlock (free_list_lock);
}

将当前线程的arena设置为上述找到的尾节点的arena:

1
2
3
LIBC_PROBE (memory_arena_reuse, 2, result, avoid_arena);
thread_arena = result;
next_to_use = result->next;

以上, 就可以复用过去的某个arena了, 有几点注意:

每次都是找的最后一个节点? 我认为是因为这是一个FIFO队列, 所以尾节点在free list的概率比较高, 或者尾节点尽快free的概率比较高
如果目标arena在free list, 那正好拿来用, 无需考虑竞争
如果目标arena不在free list呢? 这时候就是真的复用了, 多线程环境则可能存在阻塞
计数器attached_threads在调用free的时候有用, 只有没有消费者了, 才允许塞入free list
所有arena都被塞入了一个环形链表, free arena再有一个free链表维护

free arena

free链表则在线程退出的时候会被填充, 线程退出时, 如果其arena的引用计数为0, 则会交还其arena到进程的free list, 因此, 如果有其他线程需要使用, 并且free list中的元素满足要求时, 就不再需要重新mmap了(回到上面的get_free_list):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
void
__malloc_arena_thread_freeres (void)
{
  /* Shut down the thread cache first.  This could deallocate data for
     the thread arena, so do this before we put the arena on the free
     list.  */
  tcache_thread_shutdown ();
  mstate a = thread_arena;
  thread_arena = NULL;
  if (a != NULL)
    {
      __libc_lock_lock (free_list_lock);
      /* If this was the last attached thread for this arena, put the
         arena on the free list.  */
      assert (a->attached_threads > 0);
      if (--a->attached_threads == 0)
        {
          a->next_free = free_list;
          free_list = a;
        }
      __libc_lock_unlock (free_list_lock);
    }
}

小结

以上:

一个线程的malloc维护了一个arena
arena中维护了多个bin
bin是链表结构
bin的元素是chunk
chunk会考虑内存复用, 尽可能节约内存
malloc_state(arena)是某个环形链表结构的元素
所有arena被一个链表维护(无论busy或free), 此外会有free链表维护free的arena
新arena插在链表头部
线程退出时会交还其arena到进程的free list中
如果不允许创建新的arena则会复用尾节点的arena

fastbin

先来看free是怎么工作的:

一下是free的一部分, 如果是mmap的内存, 那么unmmap就好了:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
p = mem2chunk (mem);
if (chunk_is_mmapped (p))                       /* release mmapped memory. */
{
  /* See if the dynamic brk/mmap threshold needs adjusting.
      Dumped fake mmapped chunks do not affect the threshold.  */
  if (!mp_.no_dyn_threshold
      && chunksize_nomask (p) > mp_.mmap_threshold
      && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX
      && !DUMPED_MAIN_ARENA_CHUNK (p))
    {
      mp_.mmap_threshold = chunksize (p);
      mp_.trim_threshold = 2 * mp_.mmap_threshold;
      LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                  mp_.mmap_threshold, mp_.trim_threshold);
    }
  munmap_chunk (p);
  return;
}

如果不是mmap的内存, 而是从malloc_state获取的:

1
2
ar_ptr = arena_for_chunk (p);
_int_free (ar_ptr, p, 0);

现在走到_int_free里面:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())
{
  //...
  unsigned int idx = fastbin_index(size);
  fb = &fastbin (av, idx);
  /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
  mchunkptr old = *fb, old2;
  //...
  p->fd = old2 = old;
  //...
}

如果当前chunk的size小于fastbin的最大size, 那么就需要塞入到fastbin. 先根据size找到对应的fastbin, 然后把当前chunk塞入到头节点即可. 这里可以得到关于fastbin的几个信息:

fastbin是单向链表
fastbin中每个bin中的chunk大小相等
fastbin对应小内存

unsorted_bin

跟着_int_free继续走, 如果需要释放的chunk比fastbin的最大大小还要大, 则会尝试合并:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
else if (!chunk_is_mmapped(p)) {
  //...

  // 获取下一个chunk
  nextchunk = chunk_at_offset(p, size);

  // 拿到下一个chunk的size
  nextsize = chunksize(nextchunk);

  /* consolidate backward */
  // 如果上一个chunk可被访问, 合并上一个chunk到当前chunk
  if (!prev_inuse(p)) {
    // 获取上一个chunk的大小
    prevsize = prev_size (p);

    // 扩充当前chunk的size
    size += prevsize;

    // 获取上一个chunk的位置
    p = chunk_at_offset(p, -((long) prevsize));
    if (__glibc_unlikely (chunksize(p) != prevsize))
      malloc_printerr ("corrupted size vs. prev_size while consolidating");

    // 将上一个chunk从arena中移除(因为被合并到当前chunk了)
    unlink_chunk (av, p);
  }
  //...

  // 下一个chunk同理, 但是需要考虑是不是在top, 以下假设下一个chunk也合并了

  // 合并后的chunk插入到unsorted_bin的头部
  bck = unsorted_chunks(av);
  fwd = bck->fd;
  if (__glibc_unlikely (fwd->bk != bck))
    malloc_printerr ("free(): corrupted unsorted chunks");
  p->fd = fwd;
  p->bk = bck;
  if (!in_smallbin_range(size))
    {
      p->fd_nextsize = NULL;
      p->bk_nextsize = NULL;
    }
  bck->fd = p;
  fwd->bk = p;

  // 设置新chunk的size
  set_head(p, size | PREV_INUSE);
  set_foot(p, size);
}

现在可以知道了, unsorted_bin在调用free的时候可能会被扩充, 同时, 可能会将待free的chunk的上一个和下一个合并成一个大的chunk, 一起塞入unsorted_bin. 并且unsorted_bin是一个双向链表, 且只有一个.

但是, 如果上述假设的"下一个chunk也合并"不成立, 那么就不会将合并后的chunk(当前和上一个)塞入unsorted_bin, 因为top是比较特殊的结构, 详细见源码.

如果合并后的chunk太大了, 这时候就可以怀疑可能存在较大的内存碎片, 这时候就需要尝试释放部分内存了, 首先就是判断合并后的大小是否大过阈值, 以减少系统调用的次数, 如果大于阈值, 那么先把fastbin合并了.

为什么要合并fastbin才能缩减内存? 我的理解是: free之后, 内存要不就放到了fastbin, 要不就是unsorted_bin(unmmap的就不在考虑范围了), 假设要缩减内存, 那么也就只能在这两个bin中, 另外, 如果要缩减, 那也只能从堆的头部开始缩减, 但是又没法知道包含头部内存区域的chunk在哪个bin中, 又不可能在这时候合并所有的bin(比如把unsorted_bin也给合并成一个, 但是这样效率太低), 所以退而求其次, 先合并fastbin, 以增大扩充top chunk的可能性, 以达到尽可能缩减更大堆的目的.

1
2
3
4
if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) {
  if (atomic_load_relaxed (&av->have_fastchunks))
    malloc_consolidate(av);
//......

之后调用systrim(mp_.top_pad, av);或heap_trim(heap, mp_.top_pad);来缩减堆.

smallbin和largebin

现在回过去看_int_malloc就比较容易理解了, 我们跳过从fastbin或者smallbin或者largebin中拿chunk的部分, 如果上面说的几个bin中没有chunk怎么办? 那就从unsorted_bin中找.(对应上一篇未完成的第四部分)

先是判断unsorted_bin中是不是有chunk:

1
while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))

然后拿到unsorted_bin的最后一个chunk和倒数第二个chunk:

1
2
3
4
// bck表示上一个chunk
bck = victim->bk;
size = chunksize (victim);
mchunkptr next = chunk_at_offset (victim, size);

如果unsorted_bin只有一个chunk(就没必要查找了), 并且当前需求的是一个small chunk, 并且需求的大小比这个chunk的大小还小一些, 那么就裁剪这个chunk, 然后放回需要的部分:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// 检查大小以及unsorted_bin中chunk是不是只有一个
if (in_smallbin_range (nb) &&
    // 判断unsorted_chunks中只有一个chunk
    bck == unsorted_chunks (av) &&
    victim == av->last_remainder &&
    (unsigned long) (size) > (unsigned long) (nb + MINSIZE))
{
  /* split and reattach remainder */
  // 只分配需要的大小
  remainder_size = size - nb;

  // 多余的部分继续塞在unsorted_chunks
  remainder = chunk_at_offset (victim, nb);
  unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
  av->last_remainder = remainder;
  remainder->bk = remainder->fd = unsorted_chunks (av);
  if (!in_smallbin_range (remainder_size))
    {
      remainder->fd_nextsize = NULL;
      remainder->bk_nextsize = NULL;
    }

  // 设置chunk信息
  set_head (victim, nb | PREV_INUSE |
            (av != &main_arena ? NON_MAIN_ARENA : 0));
  set_head (remainder, remainder_size | PREV_INUSE);
  set_foot (remainder, remainder_size);
  check_malloced_chunk (av, victim, nb);
  void *p = chunk2mem (victim);
  alloc_perturb (p, bytes);
  return p;
}

不过, 这里还是没有smallbin什么事, 下次free的时候, 这个chunk又被还给unsorted_bin了. 继续往下看:

如果上面的不满足, 那先把这个chunk从unsorted_bin拿出来:

1
2
unsorted_chunks (av)->bk = bck;
bck->fd = unsorted_chunks (av);

如果大小正好满足, 那直接返回, 否则:

如果chunk的size在smallbin的范围, 那先找到smallbin对应大小chunk的位置(largebin类似):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
if (in_smallbin_range (size))
  {
    victim_index = smallbin_index (size);
    bck = bin_at (av, victim_index);
    fwd = bck->fd;
  }
else
  {
    //...
    //largebin的塞入位置是有序的, 会判断当前chunk的size而返回不同的插入部位
    //总之, 这一步之后, largebin会变得有序
  }

然后把当前chunk塞进去:

1
2
3
4
5
mark_bin (av, victim_index);
victim->bk = bck;
victim->fd = fwd;
fwd->bk = victim;
bck->fd = victim;

但是现在并不会立即退出上文说的while循环, 而是直到达到最大的循环次数或者unsorted_bin中找到了需求的元素(只有一个chunk且大小满足或者大小刚好合适)或者unsorted_bin的所有元素都被塞入了smallbin或者largebin:

1
2
3
#define MAX_ITERS       10000
if (++iters >= MAX_ITERS)
  break;

以上, 假设没从unsorted_bin中找到合适的目标, 那么这时候unsorted_bin中所有的chunk都被填入了smallbin或者largebin了. 现在开始从这两个bin中查找:

首先是largebin:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
if (!in_smallbin_range (nb))
  {
    /* skip scan if empty or largest chunk is too small */
    if ((victim = first (bin)) != bin
        && (unsigned long) chunksize_nomask (victim)
          >= (unsigned long) (nb))
      {
        victim = victim->bk_nextsize;
        while (((unsigned long) (size = chunksize (victim)) <
                (unsigned long) (nb)))
          victim = victim->bk_nextsize;
        //...

        if (remainder_size < MINSIZE)
          {
            //...
          }
        /* Split */
        else
          {
            //...
          }
        //...
      }
  }

不需要过多关注查找逻辑, 以上表达的是, 从largebin中找chunk, 是需要找最小的适合大小, 如果最终找到的目标还是太大了, 那么会裁剪, 多余的部分放在remainder表中, 这部分逻辑与上文是类似的.

如果上部分还是没有找到, 那么就一个一个遍历后续的bin(因为bin是按照大小排序的, 这部分代码省略), 先找到有chunk的bin, 然后找满足chunk大小的目标, 类似上述逻辑, 如果没有大很多, 则直接返回, 如果大很多, 那么先裁切, 多余部分串在remainder表中. 这部分逻辑就不继续追究了.

如果还是没找到!!! 那么尝试使用top的内存(堆顶内存), 如果top内存满足, 同样是裁切然后返回, 如果不满足, 那么检查是否可以合并fastbin, 如果可以合并, 那么就合并, 再继续查找(回到unsorted_bin, 再重复上述逻辑), 如果没有fastbin, 那么就扩展当前arena, 然后直接返回一个chunk, 如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
victim = av->top;
size = chunksize (victim);
if (__glibc_unlikely (size > av->system_mem))
  malloc_printerr ("malloc(): corrupted top size");
if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
  {
    remainder_size = size - nb;
    remainder = chunk_at_offset (victim, nb);
    av->top = remainder;
    set_head (victim, nb | PREV_INUSE |
              (av != &main_arena ? NON_MAIN_ARENA : 0));
    set_head (remainder, remainder_size | PREV_INUSE);
    check_malloced_chunk (av, victim, nb);
    void *p = chunk2mem (victim);
    alloc_perturb (p, bytes);
    return p;
  }
/* When we are using atomic ops to free fast chunks we can get
    here for all block sizes.  */
else if (atomic_load_relaxed (&av->have_fastchunks))
  {
    malloc_consolidate (av);
    /* restore original bin index */
    if (in_smallbin_range (nb))
      idx = smallbin_index (nb);
    else
      idx = largebin_index (nb);
  }
/*
    Otherwise, relay to handle system-dependent cases
  */
else
  {
    void *p = sysmalloc (nb, av);
    if (p != NULL)
      alloc_perturb (p, bytes);
    return p;
  }

sysmalloc用来sbrk或者mmap内存, 上一篇中有提到过.

小结

以上:

smallbin和largebin类似, 都是双链表结构, 用来存储fastbin以外的比较大的chunk
smallbin中的chunk不保证顺序
largebin中的chunk保证大小顺序
unsorted_bin中的chunk可能会被转移到smallbin或largebin中
chunk可以合并, 也可以裁切
如果bin中的chunk不满足需求, 则可能通过top分配chunk
如果top的内存也不满足需求, 则malloc的时候也可能尝试合并fastbin
如果fastbin也不满足需求, 那么会尝试扩展arena

tcache

略.(TODO:待补充)

总结

先来回答最开始的问题:

tcache: 略
fastbin: 单向链表, 存放小内存, 每个bin中的size相同, free的时候小内存直接还给fastbin, malloc的时候小内存也可以从fastbin获取
smallbin: 双向链表, 存放介于fastbin和largebin大小之间的内存, bin有大小顺序, bin中的chunk没有大小顺序, 可以通过unsorted_bin扩充；largebin同smallbin, 但是每个bin的chunks也是有大小序的
unsorted_bin: 类似回收机制, free的时候比fastbin大的chunk(mmap极大chunk除外)会被放到unsorted_bin, 并且这个chunk可能会与上一个和下一个合并后再加入unsorted_bin, fastbin在某个阈值之后也会被合并加入到unsorted_bin
以上如何管理？我的理解是fastbin/smallbin/largebin用于获取内存(unsorted_bin也有概率直接返回给用户内存), fastbin/unsorted_bin用于回收内存, 但是如果发现回收内存太大时, 则可能会通过trim缩减内存
以上什么关系？同上
bin是怎么初始化的？首先, arena的初始化也伴随着bin的初始化, 所以进程/线程初次调用malloc的时候, 就会初始化bin了, 如果各种bin(或者说arena)不满足需求了, 则还会通过sysmalloc扩充bin(或arena)
bin是怎么扩充的，可以扩充吗？同上, 但是bin的数量是固定的, bin中的chunk的数量未见有限制

malloc的一般故事流程是(要说清楚每一个case的话, 太复杂了, 所以以下是我认为比较通常的情况):

启动进程, 调用malloc
malloc init, 初始化main arena
通过各种bin没有找到需要的chunk, 扩充bin, 返回一个chunk
调用free
回收内存, 如果是很小的内存, 则暂存在fastbin, 如果是比较大的内存, 则与相邻内存合并后放入unsorted_bin
再调用malloc
如果发现fastbin符合要求, 则直接返回fastbin的内存
如果发现smallbin/largebin符合要求, 则返回对应内存
如果以上都不满足, 则从unsorted_bin中将内存搬运到smallbin/largebin后, 再从这两个bin中查找
重复以上步骤4~9, 发现回收的内存很大, 则尝试缩减内存
启动了一个新的线程
从free list尝试获取一个arena, 如果没法获取, 创建一个新的arena, 线程独立使用, 所以几乎无需担心竞争
如果不允许创建新的arena了, 则尝试复用"最老"的一个arena, 需要当心竞争, 所以会阻塞
线程退出, 检查对应的arena, 如果计数器为0了, 那么将对应arena加入free list