[PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space
Ken Matsui
kmatsui@cs.washington.edu
Thu Sep 14 21:44:06 GMT 2023
On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:
>
> > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> > index 545f0f4d9eb..eed6deaf0f8 100644
> > --- a/gcc/c/c-parser.h
> > +++ b/gcc/c/c-parser.h
> > @@ -51,14 +51,14 @@ enum c_id_kind {
> > /* A single C token after string literal concatenation and conversion
> > of preprocessing tokens to tokens. */
> > struct GTY (()) c_token {
> > + /* If this token is a keyword, this value indicates which keyword.
> > + Otherwise, this value is RID_MAX. */
> > + ENUM_BITFIELD (rid) keyword : 16;
> > /* The kind of token. */
> > ENUM_BITFIELD (cpp_ttype) type : 8;
> > /* If this token is a CPP_NAME, this value indicates whether also
> > declared as some kind of type. Otherwise, it is C_ID_NONE. */
> > ENUM_BITFIELD (c_id_kind) id_kind : 8;
> > - /* If this token is a keyword, this value indicates which keyword.
> > - Otherwise, this value is RID_MAX. */
> > - ENUM_BITFIELD (rid) keyword : 8;
> > /* If this token is a CPP_PRAGMA, this indicates the pragma that
> > was seen. Otherwise it is PRAGMA_NONE. */
> > ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>
> If you want to optimize layout, I'd expect flags to move so it can share
> the same 32-bit unit as the pragma_kind bit-field (not sure if any changes
> should be made to the declaration of flags to maximise the chance of such
> sharing across different host bit-field ABIs).
>
Thank you for your review!
I did not make this change aggressively, but we can do the following
to minimize the fragmentation:
struct GTY (()) c_token {
tree value; /* pointer, depends, but 4 or 8 bytes as usual */
location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
unsigned char flags; /* 1 byte */
}
Supposing a pointer size is 8 bytes and int is 4 bytes, the struct
size would be 24 bytes. The internal fragmentation would be 0 bytes,
and the external fragmentation is 6 bytes since the overall struct
alignment requirement is $K_{max} = 8$ from the pointer.
Here is the original struct before making keyword 16-bit. The overall
struct alignment requirement is $K_{max} = 8$ from the pointer. This
struct size would be 24 bytes since the internal fragmentation is 4
bytes (after location), and the external fragmentation is 3 bytes.
struct GTY (()) c_token {
ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */
ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
tree value; /* pointer, depends, but 4 or 8 bytes as usual */
unsigned char flags; /* 1 byte */
}
If we keep the original order with the 16-bit keyword, the struct size
would be 32 bytes (my current implementation as well, I will update
this patch).
struct GTY (()) c_token {
ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
tree value; /* pointer, depends, but 4 or 8 bytes as usual */
unsigned char flags; /* 1 byte */
}
Likewise, the overall struct alignment requirement is $K_{max} = 8$
from the pointer. The internal fragmentation would be 7 bytes (3 bytes
after pragma_kind + 4 bytes after location), and the external
fragmentation would be 7 bytes.
I think optimizing the size is worth doing unless this breaks GCC.
> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 6cbb9a8e031..3c3c482c6ce 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
> > /* A C++ token. */
> >
> > struct GTY (()) cp_token {
> > - /* The kind of token. */
> > - enum cpp_ttype type : 8;
> > /* If this token is a keyword, this value indicates which keyword.
> > Otherwise, this value is RID_MAX. */
> > - enum rid keyword : 8;
> > + enum rid keyword : 16;
> > + /* The kind of token. */
> > + enum cpp_ttype type : 8;
> > /* Token flags. */
> > unsigned char flags;
> > /* True if this token is from a context where it is implicitly extern "C" */
>
> You're missing an update to the "3 unused bits." comment further down.
>
> > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
> > unsigned int directive_index : 7; /* If is_directive,
> > then index into directive table.
> > Otherwise, a NODE_OPERATOR. */
> > - unsigned int rid_code : 8; /* Rid code - for front ends. */
> > + unsigned int rid_code : 16; /* Rid code - for front ends. */
> > unsigned int flags : 9; /* CPP flags. */
> > ENUM_BITFIELD(node_type) type : 2; /* CPP node type. */
>
> You're missing an update to the "5 bits spare." comment further down.
>
Thank you!
> Do you have any figures for the effects on compilation time or memory
> usage from the increase in size of these structures?
>
Regarding only c_token, we will have the same size if we optimize the
size. Although I did not calculate the size of other structs, we might
not see any significant performance change? I am taking benchmarks and
will let you know once it is done.
> --
> Joseph S. Myers
> joseph@codesourcery.com
More information about the Libstdc++
mailing list