[PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space

Thu Sep 14 21:44:06 GMT 2023

On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers <joseph@codesourcery.com> wrote:
>
> On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:
>
> > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> > index 545f0f4d9eb..eed6deaf0f8 100644
> > --- a/gcc/c/c-parser.h
> > +++ b/gcc/c/c-parser.h
> > @@ -51,14 +51,14 @@ enum c_id_kind {
> >  /* A single C token after string literal concatenation and conversion
> >     of preprocessing tokens to tokens.  */
> >  struct GTY (()) c_token {
> > +  /* If this token is a keyword, this value indicates which keyword.
> > +     Otherwise, this value is RID_MAX.  */
> > +  ENUM_BITFIELD (rid) keyword : 16;
> >    /* The kind of token.  */
> >    ENUM_BITFIELD (cpp_ttype) type : 8;
> >    /* If this token is a CPP_NAME, this value indicates whether also
> >       declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
> >    ENUM_BITFIELD (c_id_kind) id_kind : 8;
> > -  /* If this token is a keyword, this value indicates which keyword.
> > -     Otherwise, this value is RID_MAX.  */
> > -  ENUM_BITFIELD (rid) keyword : 8;
> >    /* If this token is a CPP_PRAGMA, this indicates the pragma that
> >       was seen.  Otherwise it is PRAGMA_NONE.  */
> >    ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>
> If you want to optimize layout, I'd expect flags to move so it can share
> the same 32-bit unit as the pragma_kind bit-field (not sure if any changes
> should be made to the declaration of flags to maximise the chance of such
> sharing across different host bit-field ABIs).
>

Thank you for your review!

I did not make this change aggressively, but we can do the following
to minimize the fragmentation:

struct GTY (()) c_token {
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  unsigned char flags; /* 1 byte */
}

Supposing a pointer size is 8 bytes and int is 4 bytes, the struct
size would be 24 bytes. The internal fragmentation would be 0 bytes,
and the external fragmentation is 6 bytes since the overall struct
alignment requirement is $K_{max} = 8$ from the pointer.

Here is the original struct before making keyword 16-bit. The overall
struct alignment requirement is $K_{max} = 8$ from the pointer. This
struct size would be 24 bytes since the internal fragmentation is 4
bytes (after location), and the external fragmentation is 3 bytes.

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

If we keep the original order with the 16-bit keyword, the struct size
would be 32 bytes (my current implementation as well, I will update
this patch).

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

Likewise, the overall struct alignment requirement is $K_{max} = 8$
from the pointer. The internal fragmentation would be 7 bytes (3 bytes
after pragma_kind + 4 bytes after location), and the external
fragmentation would be 7 bytes.

I think optimizing the size is worth doing unless this breaks GCC.

> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 6cbb9a8e031..3c3c482c6ce 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
> >  /* A C++ token.  */
> >
> >  struct GTY (()) cp_token {
> > -  /* The kind of token.  */
> > -  enum cpp_ttype type : 8;
> >    /* If this token is a keyword, this value indicates which keyword.
> >       Otherwise, this value is RID_MAX.  */
> > -  enum rid keyword : 8;
> > +  enum rid keyword : 16;
> > +  /* The kind of token.  */
> > +  enum cpp_ttype type : 8;
> >    /* Token flags.  */
> >    unsigned char flags;
> >    /* True if this token is from a context where it is implicitly extern "C" */
>
> You're missing an update to the "3 unused bits." comment further down.
>
> > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
> >    unsigned int directive_index : 7;  /* If is_directive,
> >                                          then index into directive table.
> >                                          Otherwise, a NODE_OPERATOR.  */
> > -  unsigned int rid_code : 8;         /* Rid code - for front ends.  */
> > +  unsigned int rid_code : 16;                /* Rid code - for front ends.  */
> >    unsigned int flags : 9;            /* CPP flags.  */
> >    ENUM_BITFIELD(node_type) type : 2; /* CPP node type.  */
>
> You're missing an update to the "5 bits spare." comment further down.
>

Thank you!

> Do you have any figures for the effects on compilation time or memory
> usage from the increase in size of these structures?
>

Regarding only c_token, we will have the same size if we optimize the
size. Although I did not calculate the size of other structs, we might
not see any significant performance change? I am taking benchmarks and
will let you know once it is done.

> --
> Joseph S. Myers
> joseph@codesourcery.com