This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: About the dll search algorithm of dlopen (patch-r2)


On 08/22/2016 02:15 PM, Michael Haubenwallner wrote:
> On 08/20/2016 09:32 PM, Corinna Vinschen wrote:
>>>>>
>>>>> One way around YA code duplication could be some kind of path iterator
>>>>> class which could be used from find_exec as well as from
>>>>> get_full_path_of_dll.
> 
>>>> 0001.patch is a draft for some new cygwin::pathfinder class, with
>>>> 0002.patch adding the executable's directory as searchpath, and
>>>> 0003.patch to search the PATH environment as well.
>>>>
>>>> Thoughts?
>>
>> Ok, that might be disappointing now because you already put so much work
>> into it, but I actually expected some more discussion first.  I have two
>> problem with this.
>>
>> I'm not a big fan of templates.
> 
> Never mind, it's been some template exercise to me anyway.
> 
>> What I had in mind was a *simple* class which gets told if it searches
>> for libs or executables and then checks the different paths accordingly,
>> kind of a copy of find_exec as a class, just additionally handling the
>> prefix issue for DLLs.

What about this one: 've dropped any templates now,
while keeping the basenames logic inside dlopen.

/haubi/

> 
> What I'm more interested in for such a class is the actual API for use
> by dlopen() and exec(), and the final list of files searched for - with
> these use cases coming to my mind:
> 
> Libraries/dlls with final search path "/lib:/morelibs":
>   L1) dlopen("libN.so")
>   L2) dlopen("libN.dll")
>   L3) dlopen("cygN.dll")
>   L4) dlopen("N.so")
>   L5) dlopen("N.dll")
> Executables with final search path "/bin:/moreexes"
>   X1) exec("X")
>   X2) exec("X.exe")
>   X3) exec("X.com")
> 
> Instead of API calls similar to:
>   L1) find(dll, "N", ["/lib", "/morelibs"])
>   L2) find(dll, "N", ["/lib", "/morelibs"])
>   L3) find(dll, "N", ["/lib", "/morelibs"])
>   L4) find(dll, "N", ["/lib", "/morelibs"])
>   L5) find(dll, "N", ["/lib", "/morelibs"])
>   X1) find(exe, "X", ["/bin", "/moreexes"])
>   X2) find(exe, "X", ["/bin", "/moreexes"])
>   X3) find(exe, "X", ["/bin", "/moreexes"])
> it feels necessary to support more explicit naming, as in:
>   L1) find(["libN.so", "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
>   L2) find([           "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
>   L3) find([           "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
>   L4) find(["N.so", "N.dll"                  ], ["/lib/../bin","/lib","/morelibs"])
>   L5) find([        "N.dll"                  ], ["/lib/../bin","/lib","/morelibs"])
>   X1) find(["X", "X.exe","X.com"], ["/bin","/moreexes"])
>   X2) find(["X", "X.exe"        ], ["/bin","/moreexes"])
>   X3) find(["X", "X.com"        ], ["/bin","/moreexes"])
> 
> Where the find method does not need to actually know whether it searches
> for a dll or an exe, but dlopen() and exec() instead define the file
> names to search for. This is what the patch draft does in dlopen.
> 
>>>>>>>> *) The directory of the current main executable should be searched
>>>>>>>>    after LD_LIBRARY_PATH and before /usr/bin:/usr/lib.
>>>>>>>>    And PATH should be searched before /usr/bin:/usr/lib as well.
>>>>>>>
>>>>>>> Checking the executable path and $PATH are Windows concepts.  dlopen
>>>>>>> doesn't do that on POSIX systems and we're not doing that either.
>>>>>>
>>>>>> Agreed, but POSIX also does have the concept of embedded RUNPATH,
>>>>>> which is completely missing in Cygwin as far as I can see.
>>>>>
>>>>> RPATH and RUNPATH are ELF dynamic loader features, not supported by
>>>>> PE/COFF.
>>>>
>>>> In any case, to me it does feel quite important to have the (almost) same
>>>> dll search algorithm with dlopen() as with CreateProcess().
>>
>> Last but not least I'm not yet convinced if it's *really* a good idea to
>> prepend the executable path to the DLL search path unconditionally.  Be
>> it as it is in terms of DT_RUNPATH, why is the application dir a good
>> choice at all, unless we're talking Windows semantics?  Which we don't.
>> Also, if loading from the applications dir from dlopen is important for
>> you, you can emulate it by adding the application dir to LD_LIBRAYR_PATH.
> 
> As long as there is lack of a Cygwin specific dll loader to find the
> dlls to load during process startup, we're bound to Windows semantics.
> 
> For dlopen, it is more important to find the same dll file as would be
> found when the exe was linked against that dll file, rather than using
> the Linux-known algorithm and environment variables - and differ from
> process startup: Both really should result in the same algorithm here,
> even if that means some difference compared to Linux.
> 
> As far as I understand, lack of DT_RUNPATH (besides /etc/ld.so.conf)
> support during process start was the main reason for the dlls to install
> into /lib/../bin instead of /lib at all, to be found at process start
> because of residing in the application's bin dir:
> Why should that be different for dlopen?
> 
>> I checked for the usage of DT_RUNPATH/DT_RPATH on Fedora 23 and only a
>> limited number of packages use it (texlive, samba, python, man-db,
>> swipl, and a few more).  Some of them, like texlive, even use it wrongly,
>> with RPATH pointing to a non-existing build dir.  There are also a few
>> stray "/usr/lib64" settings, but all in all it's not used to point to
>> the dir the application is installed to, but rather to some package specific
>> subdir, e.g.  /usr/lib64/samba, /usr/lib64/swipl-7.2.3/lib/x86_64-linux,
>> etc.
> 
> On Linux, the binaries installed in /usr usually rely on the Linux
> loader to be configured via /etc/ld.so.conf to find their runtime
> libs in /usr/lib.
> 
> Please remember: This whole thing is not a problem with packages
> installed to /usr, but with packages installed to /somewhere/else
> that provide runtime libraries that are also available in /usr.
> 
> Using LD_LIBRARY_PATH pointing to /somewhere/else/lib may break the
> binaries found in /usr/bin - and agreed, searching PATH doesn't make
> it better, as PATH is the "LD_LIBRARY_PATH" for Windows.
> 
>> IMHO this means just adding the applications bin dir is most of the time
>> an unused or even wrong workaround.
> 
> Although GetModuleHandle may reduce that pressure for dlopen - as long as
> the applications bin dir is searched at process start, it really should
> be searched by dlopen too, even if for /usr/bin/* this might indeed become
> redundant, as we always add /usr/bin in dlopen - which really mimics
> the /etc/ld.so.conf content actually, although that one is unavailable
> to process startup.
> 
> Thanks!
> /haubi/
> 

>From ff75702593aed447f695efa37fc3312495681f8f Mon Sep 17 00:00:00 2001
From: Michael Haubenwallner <michael.haubenwallner@ssi-schaefer.com>
Date: Tue, 31 May 2016 13:09:11 +0000
Subject: [PATCH 1/3] dlopen: search each name within one single search dir

Search x/bin:x/lib if searching in x/lib.

Introduces and uses new pathfinder class, which
introduces and uses new vstrlist class.
---
 winsup/cygwin/dlfcn.cc     |  71 ++++++-------
 winsup/cygwin/pathfinder.h | 112 ++++++++++++++++++++
 winsup/cygwin/vstrlist.h   | 253 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 396 insertions(+), 40 deletions(-)
 create mode 100644 winsup/cygwin/pathfinder.h
 create mode 100644 winsup/cygwin/vstrlist.h

diff --git a/winsup/cygwin/dlfcn.cc b/winsup/cygwin/dlfcn.cc
index 255a6d5..da20a58 100644
--- a/winsup/cygwin/dlfcn.cc
+++ b/winsup/cygwin/dlfcn.cc
@@ -20,6 +20,7 @@ details. */
 #include "cygtls.h"
 #include "tls_pbuf.h"
 #include "ntdll.h"
+#include "pathfinder.h"
 
 static void
 set_dl_error (const char *str)
@@ -28,29 +29,6 @@ set_dl_error (const char *str)
   _my_tls.locals.dl_error = 1;
 }
 
-/* Look for an executable file given the name and the environment
-   variable to use for searching (eg., PATH); returns the full
-   pathname (static buffer) if found or NULL if not. */
-inline const char *
-check_path_access (const char *mywinenv, const char *name, path_conv& buf)
-{
-  return find_exec (name, buf, mywinenv, FE_NNF | FE_DLL);
-}
-
-/* Search LD_LIBRARY_PATH for dll, if it exists.  Search /usr/bin and /usr/lib
-   by default.  Return valid full path in path_conv real_filename. */
-static inline bool
-gfpod_helper (const char *name, path_conv &real_filename)
-{
-  if (strchr (name, '/'))
-    real_filename.check (name, PC_SYM_FOLLOW | PC_NULLEMPTY);
-  else if (!check_path_access ("LD_LIBRARY_PATH", name, real_filename))
-    check_path_access ("/usr/bin:/usr/lib", name, real_filename);
-  if (!real_filename.exists ())
-    real_filename.error = ENOENT;
-  return !real_filename.error;
-}
-
 static bool
 get_full_path_of_dll (const char* str, path_conv &real_filename)
 {
@@ -63,38 +41,50 @@ get_full_path_of_dll (const char* str, path_conv &real_filename)
       return false;		/* Yes.  Let caller deal with it. */
     }
 
-  tmp_pathbuf tp;
-  char *name = tp.c_get ();
+  pathfinder::basenamelist basenames;
 
-  strcpy (name, str);	/* Put it somewhere where we can manipulate it. */
+  const char *basename = strrchr (str, '/');
+  basename = basename ? basename + 1 : str;
 
-  char *basename = strrchr (name, '/');
-  basename = basename ? basename + 1 : name;
-  char *suffix = strrchr (name, '.');
-  if (suffix && suffix < basename)
-    suffix = NULL;
+  int baselen = str + len - basename;
+  const char *suffix = strrchr (basename, '.');
 
+  char const * ext = "";
+  int extlen = 0;
   /* Is suffix ".so"? */
   if (suffix && !strcmp (suffix, ".so"))
     {
       /* Does the file exist? */
-      if (gfpod_helper (name, real_filename))
-	return true;
+      basenames.appendv (basename, baselen, NULL);
       /* No, replace ".so" with ".dll". */
-      strcpy (suffix, ".dll");
+      baselen -= 3;
+      ext = ".dll";
+      extlen = 4;
     }
   /* Does the filename start with "lib"? */
   if (!strncmp (basename, "lib", 3))
     {
       /* Yes, replace "lib" with "cyg". */
-      strncpy (basename, "cyg", 3);
-      /* Does the file exist? */
-      if (gfpod_helper (name, real_filename))
-	return true;
+      basenames.appendv ("cyg", 3, basename+3, baselen-3, ext, extlen, NULL);
       /* No, revert back to "lib". */
-      strncpy (basename, "lib", 3);
     }
-  if (gfpod_helper (name, real_filename))
+  basenames.appendv (basename, baselen, ext, extlen, NULL);
+
+  pathfinder finder (basenames);
+
+  if (basename > str)
+    finder.add_searchdir (str, basename - 1 - str);
+  else
+    {
+      /* NOTE: The Windows loader (for linked dlls) does
+	 not use the LD_LIBRARY_PATH environment variable. */
+      finder.add_envsearchpath ("LD_LIBRARY_PATH");
+
+      /* Finally we better have some fallback. */
+      finder.add_searchdir ("/usr/lib", -1);
+    }
+
+  if (finder.check_path_access (real_filename))
     return true;
 
   /* If nothing worked, create a relative path from the original incoming
@@ -113,6 +103,7 @@ dlopen (const char *name, int flags)
 {
   void *ret = NULL;
 
+  debug_printf ("flags %d for %s", flags, name);
   if (name == NULL)
     {
       ret = (void *) GetModuleHandle (NULL); /* handle for the current module */
diff --git a/winsup/cygwin/pathfinder.h b/winsup/cygwin/pathfinder.h
new file mode 100644
index 0000000..3453692
--- /dev/null
+++ b/winsup/cygwin/pathfinder.h
@@ -0,0 +1,112 @@
+/* pathfinder.h: find one of multiple file names in path
+
+   Copyright 2016 Red Hat, Inc.
+
+This file is part of Cygwin.
+
+This software is a copyrighted work licensed under the terms of the
+Cygwin license.  Please consult the file "CYGWIN_LICENSE" for
+details. */
+
+#include "vstrlist.h"
+
+#ifdef __cplusplus
+
+class pathfinder
+{
+public:
+  typedef vstrlist basenamelist;
+
+private:
+  pathfinder ();
+  pathfinder (pathfinder const &);
+  pathfinder & operator = (pathfinder const &);
+
+  basenamelist basenames_;
+  size_t basenames_maxlen_;
+
+  typedef vstrlist searchbufferlist;
+  searchbufferlist searchbuffers_;
+
+public:
+  ~pathfinder () {}
+
+  /* We need the basenames to search for first, to allow for optimized
+     memory allocation of each searchpath + basename combination.
+     The incoming list of basenames is emptied (ownership take over). */
+  pathfinder (basenamelist & basenames)
+    : basenames_ ()
+    , basenames_maxlen_ ()
+    , searchbuffers_()
+  {
+    basenames.swap(basenames_);
+
+    for (basenamelist::iterator basename = basenames_.begin ();
+	 basename != basenames_.end ();
+	 ++ basename)
+      {
+        if (basenames_maxlen_ < basename->stringlength ())
+	  basenames_maxlen_ = basename->stringlength ();
+      }
+  }
+
+  void add_searchdir (const char *dir, int dirlen)
+  {
+      if (dirlen < 0)
+        dirlen = strlen (dir);
+
+      if (!dirlen)
+        return;
+
+      /* Search "x/bin:x/lib" for "x/lib" */
+      if (dirlen >=4 && !strncmp (dir + dirlen - 4, "/lib", 4))
+	/* prealloc buffer in searchdir for any basename we will search for */
+	searchbuffers_.appendv (dir, dirlen - 4, "/bin", 4, "/", 1 + basenames_maxlen_, NULL);
+
+      /* prealloc buffer in searchdir for any basename we will search for */
+      searchbuffers_.appendv (dir, dirlen, "/", 1 + basenames_maxlen_, NULL);
+  }
+
+  void add_searchpath (const char *path)
+  {
+    while (path && *path)
+      {
+        const char *next = strchr (path, ':');
+        add_searchdir (path, next ? next - path : -1);
+	path = next ? next + 1 : next;
+      }
+  }
+
+  void add_envsearchpath (const char *envpath)
+    {
+      add_searchpath (getenv (envpath));
+    }
+
+  /* Within each searchdir registered, try each registered basename to
+     find as executable.  Returns found dir/basename in real_filename.
+     Returns true when found. */
+  bool check_path_access (path_conv& real_filename)
+  {
+    for (searchbufferlist::iterator dir = searchbuffers_.begin ();
+	 dir != searchbuffers_.end ();
+	 ++dir)
+      for (basenamelist::iterator name = basenames_.begin ();
+	   name != basenames_.end ();
+	   ++name)
+	{
+	  /* complete the filename path to search for */
+	  memcpy (dir->buffer () + dir->stringlength (), name->string (), name->stringlength () + 1);
+	  debug_printf ("trying %s", dir->buffer ());
+	  real_filename.check (dir->string (), PC_SYM_FOLLOW | PC_POSIX);
+	  if (real_filename.exists () && !real_filename.isdir ())
+	    {
+	      debug_printf (" found %s", dir->buffer ());
+	      return true;
+	    }
+	}
+    real_filename.error = ENOENT;
+    return !real_filename.error;
+  }
+};
+
+#endif /* __cplusplus */
diff --git a/winsup/cygwin/vstrlist.h b/winsup/cygwin/vstrlist.h
new file mode 100644
index 0000000..ecbdc64
--- /dev/null
+++ b/winsup/cygwin/vstrlist.h
@@ -0,0 +1,253 @@
+/* vstrlist.h: vstrlist
+
+   Copyright 2016 Red Hat, Inc.
+
+This file is part of Cygwin.
+
+This software is a copyrighted work licensed under the terms of the
+Cygwin license.  Please consult the file "CYGWIN_LICENSE" for
+details. */
+
+#ifdef __cplusplus
+
+class vstrlist
+{
+public:
+  class member
+  {
+    friend class vstrlist;
+    friend class iterator;
+
+    member * prev_;
+    member * next_;
+    size_t   bufferlength_;
+    size_t   stringlength_;
+    char     buffer_[1]; /* we always have space for a trailing zero */
+
+    /* no copy */
+    member (member const &);
+    member & operator = (member const &);
+
+    /* anchor */
+    void * operator new (size_t class_size)
+    {
+      return cmalloc_abort (HEAP_STR, class_size);
+    }
+
+    /* anchor */
+    member ()
+      : prev_ (this)
+      , next_ (this)
+      , bufferlength_ (0)
+      , stringlength_ (0)
+      , buffer_ ()
+    {}
+
+    /* entry: determine memory size from args */
+    void * operator new (size_t class_size, char const * part0, va_list parts)
+    {
+      char const * part = part0;
+      va_list partsdup;
+      va_copy (partsdup, parts);
+      size_t bufferlength = 0;
+      while (part)
+	{
+	  int partlen = va_arg (partsdup, int);
+	  if (partlen < 0)
+	    partlen = strlen (part);
+	  bufferlength += partlen;
+	  part = va_arg (partsdup, char const *);
+	}
+      va_end (partsdup);
+
+      return cmalloc_abort (HEAP_STR, class_size + bufferlength);
+    }
+
+    /* entry: instantly insert into list */
+    member (member * before, char const * part0, va_list parts)
+      : prev_ (before->prev_)
+      , next_ (before)
+      , bufferlength_ (0)
+      , stringlength_ ()
+      , buffer_ ()
+    {
+      prev_->next_ = this;
+      next_->prev_ = this;
+
+      char * dest = buffer_;
+      char const * part = part0;
+      va_list partsdup;
+      va_copy (partsdup, parts);
+      while (part)
+	{
+	  int partlen = va_arg (partsdup, int);
+	  if (partlen < 0)
+	    {
+	      char * old = dest;
+	      dest = stpcpy (old, part);
+	      partlen = dest - old;
+	    }
+	  else
+	    dest = stpncpy (dest, part, partlen);
+	  bufferlength_ += partlen;
+	  part = va_arg (partsdup, const char *);
+	}
+      va_end (partsdup);
+      *dest = (char)0;
+      stringlength_ = dest - buffer_;
+    }
+
+    void operator delete (void * p)
+    {
+      cfree (p);
+    }
+
+    /* remove entry from list */
+    ~member ()
+    {
+      member * next = next_;
+      member * prev = prev_;
+      next->prev_ = prev;
+      prev->next_ = next;
+      prev_ = NULL;
+      next_ = NULL;
+    }
+
+  public:
+    member const * next () const { return next_; }
+    member       * next ()       { return next_; }
+    member const * prev () const { return next_; }
+    member       * prev ()       { return next_; }
+
+    /* always is a readonly string */
+    char const * string () const { return buffer_; }
+    size_t stringlength () const { return stringlength_; }
+
+    /* give write access to the buffer when writeable */
+    char       * buffer ()       { return buffer_; }
+    size_t bufferlength ()       { return bufferlength_; }
+  };
+
+  class iterator
+  {
+    friend class vstrlist;
+
+    member * current_;
+
+    iterator ();
+
+    iterator (member * current)
+      : current_ (current)
+    {}
+
+  public:
+    iterator (iterator const & rhs)
+      : current_ (rhs.current_)
+    {}
+
+    iterator & operator = (iterator const & rhs)
+    {
+      current_ = rhs.current_;
+      return *this;
+    }
+
+    iterator & operator ++ ()
+    {
+      current_ = current_->next ();
+      return *this;
+    }
+
+    iterator operator ++ (int)
+    {
+      iterator ret (*this);
+      current_ = current_->next ();
+      return ret;
+    }
+
+    iterator & operator -- ()
+    {
+      current_ = current_->prev ();
+      return *this;
+    }
+
+    iterator operator -- (int)
+    {
+      iterator ret (*this);
+      current_ = current_->prev ();
+      return ret;
+    }
+
+    bool operator == (iterator const & rhs) const
+    {
+      return current_ == rhs.current_;
+    }
+
+    bool operator != (iterator const & rhs) const
+    {
+      return current_ != rhs.current_;
+    }
+
+    member const & operator *  () const { return *current_; }
+    member       & operator *  ()       { return *current_; }
+    member const * operator -> () const { return  current_; }
+    member       * operator -> ()       { return  current_; }
+
+    void remove ()
+    {
+      member * old = current_;
+      ++ *this;
+      delete old;
+    }
+  };
+
+private:
+  member * anchor_;
+
+  /* no copy */
+  vstrlist (vstrlist const &);
+  vstrlist & operator = (vstrlist const &);
+
+public:
+  iterator  begin () { return iterator (anchor_->next ()); }
+  iterator  end   () { return iterator (anchor_         ); }
+  iterator rbegin () { return iterator (anchor_->prev ()); }
+  iterator rend   () { return iterator (anchor_         ); }
+
+  vstrlist ()
+    : anchor_ (new member ())
+  {}
+
+  ~vstrlist ()
+  {
+    for (iterator it = begin (); it != end (); it.remove ());
+    delete anchor_;
+  }
+
+  void swap (vstrlist & that)
+  {
+    member * old = anchor_;
+    anchor_ = that.anchor_;
+    that.anchor_ = old;
+  }
+
+  member * appendv (char const * part0, va_list parts)
+  {
+    return new (part0, parts) member (anchor_, part0, parts);
+  }
+
+  member * appendv (char const * part0, ...)
+  {
+    va_list parts;
+    va_start (parts, part0);
+    member * ret = appendv (part0, parts);
+    va_end (parts);
+    return ret;
+  }
+
+  member * append (char const * part)
+  {
+    return appendv (part, strlen(part));
+  }
+};
+
+#endif /* __cplusplus */
-- 
2.8.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]