时间:2021-05-23
1. 基本介绍
tensorflow设备内存管理模块实现了一个best-fit with coalescing算法(后文简称bfc算法)。
bfc算法是Doung Lea's malloc(dlmalloc)的一个非常简单的版本。
它具有内存分配、释放、碎片管理等基本功能。
2. bfc基本算法思想
1. 数据结构
整个内存空间由一个按基址升序排列的Chunk双向链表来表示,它们的直接前趋和后继必须在地址连续的内存空间。Chunk结构体里含有实际大小、请求大小、是否被占用、基址、直接前趋、直接后继、Bin索引等信息。
2. 申请
用户申请一个内存块(malloc)。根据chunk双链表找到一个合适的内存块,如果该内存块的大小是用户申请的大小的二倍以上,那么就将该内存块切分成两块,这就是split操作。
返回其中一块给用户,并将该内存块标识为占用
Spilt操作会新增一个chunk,所以需要修改chunk双链表以维持前驱和后继关系
如果用户申请512的空间,正好有一块1024的chunk2是空闲的,由于1024/512 =2,所以chunk2 被split为2块:chunk2_1和chunk2_2。返回chunk2_1给用户并将其标志位占用状态。
3. 释放
用户释放一个内存块(free)。先将该块标记为空闲。然后根据chunk数据结构中的信息找到其前驱和后继内存块。如果前驱和后继块中有空闲的块,那么将刚释放的块和空闲的块合并成一个更大的chunk(这就是merge操作,合并当前块和其前后的空闲块)。再修改双链表结构以维持前驱后继关系。这就做到了内存碎片的回收。
如果用户要free chunk3,由于chunk3的前驱chunk2也是空闲的,所以将chunk2和chunk3合并得到一个新的chunk2',大小为chunk2和chunk3之和。
3. bins
1. bins数据结构
bfc算法采取的是被动分块的策略。最开始整个内存是一个chunk,随着用户申请空间的次数增加,最开始的大chunk会被不断的split开来,从而产生越来越多的小chunk。当chunk数量很大时,为了寻找一个合适的内存块而遍历双链表无疑是一笔巨大的开销。为了实现对空闲块的高效管理,bfc算法设计了bin这个抽象数据结构。
每个bin都有一个size属性,一个bin是一个拥有chunk size >= binsize的空闲chunk的集合。集合中的chunk按照chunk size的升序组织成单链表。bfc算法维护了一个bin的集合:bins。它由多个bin以及从属于每个bin的chunks组成。内存中所有的空闲chunk都由bins管理。
图中每一列表示一个bin,列首方格中的数字表示bin的size。bin size的大小都是256的2^n的倍。每个bin下面挂载了一系列的空闲chunk,每个chunk的chunk size都大于等于所属的bin的bin size,按照chunk size的升序挂载成单链表。
2. bins操作
bfc算法针对bins这个集合设计了三个操作:search、insert、delete。
search
给定一个chunk size,从bins中找到大于等于该chunksize的最小的那个空闲chunk。Search操作具体流程如下。如果bin以数组的形式组织,那么可以从index = chunk size /256 >>2的那个bin开始查找。最好的情况是开始查找的那个bin的chunk链表非空,那么直接返回链表头即可。这种情况时间复杂度是常数级的。最坏的情况是遍历bins数组中所有的bin。对于一般大小的内存来说,bins数组元素非常少,比如4G空间只需要23个bin就足够了(256 * 2 ^ 23 > 4G),因此也很快能返回结果。总体来说search操作是非常高效的。对于固定大小内存来说,查找时间是常数量级的。
insert
将一个空闲的chunk插入到一个bin所挂载的chunk链表中,同时需要维持chunk链表的升序关系。具体流程是直接将chunk插入到index = chunk size /256 >>2的那个bin中即可。
delete
将一个空闲的chunk从bins中移除。
4. 总结
将内存分块管理,按块进行空间分配和释放。
通过split操作将大内存块分解成用户需要的小内存块。
通过merge操作合并小的内存块,做到内存碎片回收
通过bin这个抽象数据结构实现对空闲块高效管理。
5. 代码分析
1. 代码地址
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/common_runtime
2. 数据结构
Chunk
static const int kInvalidChunkHandle = -1;...struct Chunk { size_t size = 0; // Full size of buffer. // We sometimes give chunks that are larger than needed to reduce // fragmentation. requested_size keeps track of what the client // actually wanted so we can understand whether our splitting // strategy is efficient. size_t requested_size = 0; // allocation_id is set to -1 when the chunk is not in use. It is assigned a // value greater than zero before the chunk is returned from // AllocateRaw, and this value is unique among values assigned by // the parent allocator. int64 allocation_id = -1; void* ptr = nullptr; // pointer to granted subbuffer. // If not kInvalidChunkHandle, the memory referred to by 'prev' is directly // preceding the memory used by this chunk. E.g., It should start // at 'ptr - prev->size' ChunkHandle prev = kInvalidChunkHandle; // If not kInvalidChunkHandle, the memory referred to by 'next' is directly // following the memory used by this chunk. E.g., It should be at // 'ptr + size' ChunkHandle next = kInvalidChunkHandle; // What bin are we in? BinNum bin_num = kInvalidBinNum; bool in_use() const { return allocation_id != -1; }};Bin
// A Bin is a collection of similar-sized free chunks.struct Bin { // All chunks in this bin have >= bin_size memory. size_t bin_size = 0; struct ChunkComparator { explicit ChunkComparator(BFCAllocator* allocator) : allocator_(allocator) {} // Sort first by size and then use pointer address as a tie breaker. bool operator()(const ChunkHandle ha, const ChunkHandle hb) const NO_THREAD_SAFETY_ANALYSIS { const Chunk* a = allocator_->ChunkFromHandle(ha); const Chunk* b = allocator_->ChunkFromHandle(hb); if (a->size != b->size) { return a->size < b->size; } return a->ptr < b->ptr; } private: BFCAllocator* allocator_; // The parent allocator }; typedef std::set<ChunkHandle, ChunkComparator> FreeChunkSet; // List of free chunks within the bin, sorted by chunk size. // Chunk * not owned. FreeChunkSet free_chunks; Bin(BFCAllocator* allocator, size_t bs) : bin_size(bs), free_chunks(ChunkComparator(allocator)) {}};AllocationRegion
AllocationRegion给一个连续的内存区域做指针到ChunkHandle的映射。
RegionManager
RegionManager聚集了一个或多个AllocationRegion,并提供一个从指针到基础ChunkHandle的间接层,这个间接层可在多个不连续的内存区域进行分配。
3. 分配大小
将每次分配的内存大小调整为kMinAllocationSize的N倍,这样所有内存地址都是很好地按字节对齐了。
// kMinAllocationSize = 256static const size_t kMinAllocationBits = 8;static const size_t kMinAllocationSize = 1 << kMinAllocationBits;...size_t BFCAllocator::RoundedBytes(size_t bytes) { size_t rounded_bytes = (kMinAllocationSize * ((bytes + kMinAllocationSize - 1) / kMinAllocationSize)); DCHECK_EQ(size_t{0}, rounded_bytes % kMinAllocationSize); return rounded_bytes;}4. 初始化bin
typedef int BinNum;static const int kInvalidBinNum = -1;static const int kNumBins = 21;...// 二进制2^8往左移0,1,2位// (static_cast<size_t>(256) << 0) = 256// (static_cast<size_t>(256) << 1) = 512// (static_cast<size_t>(256) << 2) = 1024size_t BinNumToSize(BinNum index) { return static_cast<size_t>(256) << index;}...char bins_space_[sizeof(Bin) * kNumBins];// Map from bin size to BinBin* BinFromIndex(BinNum index) { return reinterpret_cast<Bin*>(&(bins_space_[index * sizeof(Bin)]));}...// We create bins to fit all possible ranges that cover the// memory_limit_ starting from allocations up to 256 bytes to// allocations up to (and including) the memory limit.for (BinNum b = 0; b < kNumBins; b++) { size_t bin_size = BinNumToSize(b); VLOG(1) << "Creating bin of max chunk size " << strings::HumanReadableNumBytes(bin_size); new (BinFromIndex(b)) Bin(this, bin_size); CHECK_EQ(BinForSize(bin_size), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size + 255), BinFromIndex(b)); CHECK_EQ(BinForSize(bin_size * 2 - 1), BinFromIndex(b)); if (b + 1 < kNumBins) { CHECK_NE(BinForSize(bin_size * 2), BinFromIndex(b)); }}5. 查找bin
// 求属于第几个binBinNum BinNumForSize(size_t bytes) { uint64 v = std::max<size_t>(bytes, 256) >> kMinAllocationBits; int b = std::min(kNumBins - 1, Log2FloorNonZero(v)); return b;}// 最高位非零的二进制位数,eg: 0001 0101B 为5inline int Log2FloorNonZero(uint64 n) {#if defined(__GNUC__) return 63 ^ __builtin_clzll(n);#elif defined(PLATFORM_WINDOWS) unsigned long index; _BitScanReverse64(&index, n); return index;#else int r = 0; while (n > 0) { r++; n >>= 1; } return r;#endif}6. 查找Chunk
// 先加锁mutex_lock l(lock_);void* ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);if (ptr != nullptr) { return ptr;}// FindChunkPtr函数内部void* BFCAllocator::FindChunkPtr(BinNum bin_num, size_t rounded_bytes, size_t num_bytes) { // First identify the first bin that could satisfy rounded_bytes. for (; bin_num < kNumBins; bin_num++) { // Start searching from the first bin for the smallest chunk that fits // rounded_bytes. Bin* b = BinFromIndex(bin_num); for (auto citer = b->free_chunks.begin(); citer != b->free_chunks.end(); ++citer) { // 从之前得到的Bin索引开始,查找合适的空闲Chunk: const BFCAllocator::ChunkHandle h = (*citer); BFCAllocator::Chunk* chunk = ChunkFromHandle(h); DCHECK(!chunk->in_use()); if (chunk->size >= rounded_bytes) { // We found an existing chunk that fits us that wasn't in use, so remove // it from the free bin structure prior to using. RemoveFreeChunkIterFromBin(&b->free_chunks, citer); // If we can break the size of the chunk into two reasonably // large pieces, do so. // // TODO(vrv): What should be the criteria when deciding when // to split? // 具体实现后面会分析 if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved } // The requested size of the returned chunk is what the user // has allocated. chunk->requested_size = num_bytes; // Assign a unique id and increment the id counter, marking the // chunk as being in use. chunk->allocation_id = next_allocation_id_++; // Update stats. ++stats_.num_allocs; stats_.bytes_in_use += chunk->size; stats_.max_bytes_in_use = std::max(stats_.max_bytes_in_use, stats_.bytes_in_use); stats_.max_alloc_size = std::max<std::size_t>(stats_.max_alloc_size, chunk->size); VLOG(4) << "Returning: " << chunk->ptr; if (VLOG_IS_ON(4)) { LOG(INFO) << "A: " << RenderOccupancy(); } return chunk->ptr; } } } return nullptr;}7. 拆分Chunk
如果Chunk的大小大于等于申请内存大小的2倍,那么将该Chunk拆分成2个:第一个Chunk的大小等于申请内存大小,第二个Chunk作为它的直接后继。
if (chunk->size >= rounded_bytes * 2) { SplitChunk(h, rounded_bytes); chunk = ChunkFromHandle(h); // Update chunk pointer in case it moved}void BFCAllocator::SplitChunk(BFCAllocator::ChunkHandle h, size_t num_bytes) { // Allocate the new chunk before we do any ChunkFromHandle ChunkHandle h_new_chunk = AllocateChunk(); Chunk* c = ChunkFromHandle(h); CHECK(!c->in_use() && (c->bin_num == kInvalidBinNum)); // Create a new chunk starting num_bytes after c BFCAllocator::Chunk* new_chunk = ChunkFromHandle(h_new_chunk); new_chunk->ptr = static_cast<void*>(static_cast<char*>(c->ptr) + num_bytes); region_manager_.set_handle(new_chunk->ptr, h_new_chunk); // Set the new sizes of the chunks. new_chunk->size = c->size - num_bytes; c->size = num_bytes; // The new chunk is not in use. new_chunk->allocation_id = -1; // Maintain the pointers. // c <-> c_neighbor becomes // c <-> new_chunk <-> c_neighbor BFCAllocator::ChunkHandle h_neighbor = c->next; new_chunk->prev = h; new_chunk->next = h_neighbor; c->next = h_new_chunk; if (h_neighbor != kInvalidChunkHandle) { Chunk* c_neighbor = ChunkFromHandle(h_neighbor); c_neighbor->prev = h_new_chunk; } // Add the newly free chunk to the free bin. InsertFreeChunkIntoBin(h_new_chunk);}8. 回收chunk
加锁,获得ChunkHandle
mutex_lock l(lock_);BFCAllocator::ChunkHandle h = region_manager_.get_handle(ptr);FreeAndMaybeCoalesce(h);FreeAndMaybeCoalesce
void BFCAllocator::FreeAndMaybeCoalesce(BFCAllocator::ChunkHandle h) { Chunk* c = ChunkFromHandle(h); CHECK(c->in_use() && (c->bin_num == kInvalidBinNum)); // Mark the chunk as no longer in use c->allocation_id = -1; // Updates the stats. stats_.bytes_in_use -= c->size; // This chunk is no longer in-use, consider coalescing the chunk // with adjacent chunks. ChunkHandle chunk_to_reassign = h; // If the next chunk is free, coalesce the two if (c->next != kInvalidChunkHandle) { Chunk* cnext = ChunkFromHandle(c->next); if (!cnext->in_use()) { // VLOG(8) << "Chunk at " << cnext->ptr << " merging with c " << // c->ptr; chunk_to_reassign = h; // Deletes c->next RemoveFreeChunkFromBin(c->next); Merge(h, ChunkFromHandle(h)->next); } } // If the previous chunk is free, coalesce the two c = ChunkFromHandle(h); if (c->prev != kInvalidChunkHandle) { Chunk* cprev = ChunkFromHandle(c->prev); if (!cprev->in_use()) { // VLOG(8) << "Chunk at " << c->ptr << " merging into c->prev " // << cprev->ptr; chunk_to_reassign = c->prev; // Deletes c RemoveFreeChunkFromBin(c->prev); Merge(ChunkFromHandle(h)->prev, h); c = ChunkFromHandle(h); } } InsertFreeChunkIntoBin(chunk_to_reassign);}Merge
// Merges h1 and h2 when Chunk(h1)->next is h2 and Chunk(h2)->prev is c1.// We merge Chunk(h2) into Chunk(h1).void BFCAllocator::Merge(BFCAllocator::ChunkHandle h1, BFCAllocator::ChunkHandle h2) { Chunk* c1 = ChunkFromHandle(h1); Chunk* c2 = ChunkFromHandle(h2); // We can only merge chunks that are not in use. CHECK(!c1->in_use() && !c2->in_use()); // c1's prev doesn't change, still points to the same ptr, and is // still not in use. // Fix up neighbor pointers // // c1 <-> c2 <-> c3 should become // c1 <-> c3 BFCAllocator::ChunkHandle h3 = c2->next; c1->next = h3; CHECK(c2->prev == h1); if (h3 != kInvalidChunkHandle) { BFCAllocator::Chunk* c3 = ChunkFromHandle(h3); c3->prev = h1; } // Set the new size c1->size += c2->size; DeleteChunk(h2);}以上这篇TensorFlow内存管理bfc算法实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持。
声明:本页内容来源网络,仅供用户参考;我单位不保证亦不表示资料全面及准确无误,也不保证亦不表示这些资料为最新信息,如因任何原因,本网内容或者用户因倚赖本网内容造成任何损失或损害,我单位将不会负任何法律责任。如涉及版权问题,请提交至online#300.cn邮箱联系删除。
本文实例为大家分享了tensorflow实现弹性网络回归算法,供大家参考,具体内容如下python代码:#用tensorflow实现弹性网络算法(多变量)#使用
一、Tensorflow安装1、Tensorflow介绍Tensorflow是广泛使用的实现机器学习以及其它涉及大量数学运算的算法库之一。Tensorflow由
Lua使用基于被内置在Lua某些算法的垃圾收集自动内存管理。可以自动内存管理的结果,作为一个开发者:没有必要担心的对象分配内存。无需释放他们时,不再需要可将其设
内存管理可以说是一个比较难学的模块,之所以比较难学。一是内存管理涉及到硬件的实现原理和软件的复杂算法,二是网上关于内存管理的解释有太多错误的解释。希望可以做个内
TensorFlow模型保存/载入我们在上线使用一个算法模型的时候,首先必须将已经训练好的模型保存下来。tensorflow保存模型的方式与sklearn不太一