libstdc++/41975

Wed Nov 9 20:39:00 GMT 2011

Hi

     I just wanted to let you know that I am making progress on this issue.

     I have finally removed the begin/end nodes in buckets. However I 
now know how to also replace the doubly linked list with a singly linked 
one but I need a couple of additional days to finalize this patch. So I 
hope to be able to submit a new proposition by Monday. I hope it will be 
ok for 4.7.0.

Regards


On 10/21/2011 09:25 PM, FranÃ§ois Dumont wrote:
>     Here is an other step. This time I have changed the _Bucket type, 
> it used to be a simple _Node pointer, it is now a struct containing 2 
> nodes, the bucket begin and past-the-end nodes. This way I have been 
> able to remove the newly introduced std::fill calls.
>
>     I also change the performance test in such a way that it clearly 
> show the performance issue reported by 41975 DR. It is now so clear 
> that I had to rollback the number of tested element to 200 000, with 5 
> 000 000 the current implementation was really too slow. I also add 
> tests to compare erase from iterator from erase from key.
>
>     Here is the performance results:
>
> - Current implementation:
> 41975.cc                        first insert                       
> 2r    2u    0s  8452192mem    0pf
> 41975.cc                        erase from iterator             2832r 
> 2817u    3s -6400096mem    0pf
> 41975.cc                        second insert                      
> 2r    1u    0s  6400032mem    0pf
> 41975.cc                        erase from key                     
> 1r    1u    0s -6400032mem    0pf
>
> - With patch:
> 41975.cc                        first insert                       
> 3r    2u    0s 13704256mem    0pf
> 41975.cc                        erase from iterator                
> 1r    1u    0s -9600064mem    0pf
> 41975.cc                        second insert                      
> 2r    2u    0s  9600000mem    0pf
> 41975.cc                        erase from key                     
> 1r    1u    0s -9600000mem    0pf
>
>     I also started to document a little bit the new data structure, I 
> will enhance it if it is finally adopted.
>
>     With this implementation there is only one place left were bigs 
> holes in buckets will have an impact. It is when inserting a node in a 
> bucket for the first time, we need to look for the previous non-empty 
> bucket to link the nodes correctly. I will try to find a test showing 
> this problem. A good way to avoid it would still be to shrink the 
> hashtable which is possible in the insert methods.
>
>     I have started to think about having the same performance without 
> adding so many memory but I fear that this will have impact on 
> operations. I will have a try however.
>
> FranÃ§ois
>
>
> On 10/12/2011 10:42 PM, Paolo Carlini wrote:
>> Hi,
>>> Here is an other step forward to fix this issue. In this step nodes 
>>> are now doubly linked to each other. It avoids some loops on buckets 
>>> but performance are still not ok. I have created a performance test, 
>>> see attached 41975.cc.
>> I think we can as well commit it. I can do it.
>>> Before the patch the result was:
>>>
>>> 41975.cc                        Container generation               
>>> 2r    1u    0s  8452192mem    0pf
>>> 41975.cc                        Container erase                   
>>> 21r   21u    0s -6400096mem    0pf
>>> 41975.cc                        Container iteration               
>>> 10r   10u    0s       32mem    0pf
>>>
>>> With the attached patch it is:
>>>
>>> 41975.cc                        Container generation               
>>> 3r    2u    0s 11652176mem    0pf
>>> 41975.cc                        Container erase                  
>>> 185r  182u    0s -9600080mem    0pf
>>> 41975.cc                        Container iteration                
>>> 0r    0u    0s       48mem    0pf
>>>
>>> Of course there are more memory manipulated. Regarding operations 
>>> CPU consumption iteration is much better but erase is much worst. I 
>>> think the problem is in the invocation of std::fill that cost a lot 
>>> as soon as we start having a lot of buckets to update. This is why I 
>>> have already started to work on a last evolution to avoid those calls.
>> Excellent. I'm afraid I still don't have much to say/help and given 
>> that I don't want to distract you anyway, but where does the 
>> additional complexity of erase come from? Could you describe in a few 
>> sentences the steps you go through at erase time?
>>
>> I was also thinking that it would be great if you could add comments 
>> to the code, even if some have to be redone when you rework the code 
>> in the following drafts. And an introductory comment right before 
>> _Hashtable describing the data structures. Like, eg, the one in 
>> std::deque, very well done, IMO.
>>
>> Tomorrow, besides committing the performance testcase, I will also 
>> add those missing count testcases. Just to be of some small help ;)
>>
>> As soon as I'm done with a couple of pending things, I should be able 
>> to assist you a bit.
>>
>> Thanks,
>> Paolo.
>>
>
>