This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: About the dll search algorithm of dlopen (patch-r2)
On 08/22/2016 02:15 PM, Michael Haubenwallner wrote:
> On 08/20/2016 09:32 PM, Corinna Vinschen wrote:
>>>>>
>>>>> One way around YA code duplication could be some kind of path iterator
>>>>> class which could be used from find_exec as well as from
>>>>> get_full_path_of_dll.
>
>>>> 0001.patch is a draft for some new cygwin::pathfinder class, with
>>>> 0002.patch adding the executable's directory as searchpath, and
>>>> 0003.patch to search the PATH environment as well.
>>>>
>>>> Thoughts?
>>
>> Ok, that might be disappointing now because you already put so much work
>> into it, but I actually expected some more discussion first. I have two
>> problem with this.
>>
>> I'm not a big fan of templates.
>
> Never mind, it's been some template exercise to me anyway.
>
>> What I had in mind was a *simple* class which gets told if it searches
>> for libs or executables and then checks the different paths accordingly,
>> kind of a copy of find_exec as a class, just additionally handling the
>> prefix issue for DLLs.
What about this one: 've dropped any templates now,
while keeping the basenames logic inside dlopen.
/haubi/
>
> What I'm more interested in for such a class is the actual API for use
> by dlopen() and exec(), and the final list of files searched for - with
> these use cases coming to my mind:
>
> Libraries/dlls with final search path "/lib:/morelibs":
> L1) dlopen("libN.so")
> L2) dlopen("libN.dll")
> L3) dlopen("cygN.dll")
> L4) dlopen("N.so")
> L5) dlopen("N.dll")
> Executables with final search path "/bin:/moreexes"
> X1) exec("X")
> X2) exec("X.exe")
> X3) exec("X.com")
>
> Instead of API calls similar to:
> L1) find(dll, "N", ["/lib", "/morelibs"])
> L2) find(dll, "N", ["/lib", "/morelibs"])
> L3) find(dll, "N", ["/lib", "/morelibs"])
> L4) find(dll, "N", ["/lib", "/morelibs"])
> L5) find(dll, "N", ["/lib", "/morelibs"])
> X1) find(exe, "X", ["/bin", "/moreexes"])
> X2) find(exe, "X", ["/bin", "/moreexes"])
> X3) find(exe, "X", ["/bin", "/moreexes"])
> it feels necessary to support more explicit naming, as in:
> L1) find(["libN.so", "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
> L2) find([ "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
> L3) find([ "cygN.dll", "libN.dll"], ["/lib/../bin","/lib","/morelibs"])
> L4) find(["N.so", "N.dll" ], ["/lib/../bin","/lib","/morelibs"])
> L5) find([ "N.dll" ], ["/lib/../bin","/lib","/morelibs"])
> X1) find(["X", "X.exe","X.com"], ["/bin","/moreexes"])
> X2) find(["X", "X.exe" ], ["/bin","/moreexes"])
> X3) find(["X", "X.com" ], ["/bin","/moreexes"])
>
> Where the find method does not need to actually know whether it searches
> for a dll or an exe, but dlopen() and exec() instead define the file
> names to search for. This is what the patch draft does in dlopen.
>
>>>>>>>> *) The directory of the current main executable should be searched
>>>>>>>> after LD_LIBRARY_PATH and before /usr/bin:/usr/lib.
>>>>>>>> And PATH should be searched before /usr/bin:/usr/lib as well.
>>>>>>>
>>>>>>> Checking the executable path and $PATH are Windows concepts. dlopen
>>>>>>> doesn't do that on POSIX systems and we're not doing that either.
>>>>>>
>>>>>> Agreed, but POSIX also does have the concept of embedded RUNPATH,
>>>>>> which is completely missing in Cygwin as far as I can see.
>>>>>
>>>>> RPATH and RUNPATH are ELF dynamic loader features, not supported by
>>>>> PE/COFF.
>>>>
>>>> In any case, to me it does feel quite important to have the (almost) same
>>>> dll search algorithm with dlopen() as with CreateProcess().
>>
>> Last but not least I'm not yet convinced if it's *really* a good idea to
>> prepend the executable path to the DLL search path unconditionally. Be
>> it as it is in terms of DT_RUNPATH, why is the application dir a good
>> choice at all, unless we're talking Windows semantics? Which we don't.
>> Also, if loading from the applications dir from dlopen is important for
>> you, you can emulate it by adding the application dir to LD_LIBRAYR_PATH.
>
> As long as there is lack of a Cygwin specific dll loader to find the
> dlls to load during process startup, we're bound to Windows semantics.
>
> For dlopen, it is more important to find the same dll file as would be
> found when the exe was linked against that dll file, rather than using
> the Linux-known algorithm and environment variables - and differ from
> process startup: Both really should result in the same algorithm here,
> even if that means some difference compared to Linux.
>
> As far as I understand, lack of DT_RUNPATH (besides /etc/ld.so.conf)
> support during process start was the main reason for the dlls to install
> into /lib/../bin instead of /lib at all, to be found at process start
> because of residing in the application's bin dir:
> Why should that be different for dlopen?
>
>> I checked for the usage of DT_RUNPATH/DT_RPATH on Fedora 23 and only a
>> limited number of packages use it (texlive, samba, python, man-db,
>> swipl, and a few more). Some of them, like texlive, even use it wrongly,
>> with RPATH pointing to a non-existing build dir. There are also a few
>> stray "/usr/lib64" settings, but all in all it's not used to point to
>> the dir the application is installed to, but rather to some package specific
>> subdir, e.g. /usr/lib64/samba, /usr/lib64/swipl-7.2.3/lib/x86_64-linux,
>> etc.
>
> On Linux, the binaries installed in /usr usually rely on the Linux
> loader to be configured via /etc/ld.so.conf to find their runtime
> libs in /usr/lib.
>
> Please remember: This whole thing is not a problem with packages
> installed to /usr, but with packages installed to /somewhere/else
> that provide runtime libraries that are also available in /usr.
>
> Using LD_LIBRARY_PATH pointing to /somewhere/else/lib may break the
> binaries found in /usr/bin - and agreed, searching PATH doesn't make
> it better, as PATH is the "LD_LIBRARY_PATH" for Windows.
>
>> IMHO this means just adding the applications bin dir is most of the time
>> an unused or even wrong workaround.
>
> Although GetModuleHandle may reduce that pressure for dlopen - as long as
> the applications bin dir is searched at process start, it really should
> be searched by dlopen too, even if for /usr/bin/* this might indeed become
> redundant, as we always add /usr/bin in dlopen - which really mimics
> the /etc/ld.so.conf content actually, although that one is unavailable
> to process startup.
>
> Thanks!
> /haubi/
>
>From ff75702593aed447f695efa37fc3312495681f8f Mon Sep 17 00:00:00 2001
From: Michael Haubenwallner <michael.haubenwallner@ssi-schaefer.com>
Date: Tue, 31 May 2016 13:09:11 +0000
Subject: [PATCH 1/3] dlopen: search each name within one single search dir
Search x/bin:x/lib if searching in x/lib.
Introduces and uses new pathfinder class, which
introduces and uses new vstrlist class.
---
winsup/cygwin/dlfcn.cc | 71 ++++++-------
winsup/cygwin/pathfinder.h | 112 ++++++++++++++++++++
winsup/cygwin/vstrlist.h | 253 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 396 insertions(+), 40 deletions(-)
create mode 100644 winsup/cygwin/pathfinder.h
create mode 100644 winsup/cygwin/vstrlist.h
diff --git a/winsup/cygwin/dlfcn.cc b/winsup/cygwin/dlfcn.cc
index 255a6d5..da20a58 100644
--- a/winsup/cygwin/dlfcn.cc
+++ b/winsup/cygwin/dlfcn.cc
@@ -20,6 +20,7 @@ details. */
#include "cygtls.h"
#include "tls_pbuf.h"
#include "ntdll.h"
+#include "pathfinder.h"
static void
set_dl_error (const char *str)
@@ -28,29 +29,6 @@ set_dl_error (const char *str)
_my_tls.locals.dl_error = 1;
}
-/* Look for an executable file given the name and the environment
- variable to use for searching (eg., PATH); returns the full
- pathname (static buffer) if found or NULL if not. */
-inline const char *
-check_path_access (const char *mywinenv, const char *name, path_conv& buf)
-{
- return find_exec (name, buf, mywinenv, FE_NNF | FE_DLL);
-}
-
-/* Search LD_LIBRARY_PATH for dll, if it exists. Search /usr/bin and /usr/lib
- by default. Return valid full path in path_conv real_filename. */
-static inline bool
-gfpod_helper (const char *name, path_conv &real_filename)
-{
- if (strchr (name, '/'))
- real_filename.check (name, PC_SYM_FOLLOW | PC_NULLEMPTY);
- else if (!check_path_access ("LD_LIBRARY_PATH", name, real_filename))
- check_path_access ("/usr/bin:/usr/lib", name, real_filename);
- if (!real_filename.exists ())
- real_filename.error = ENOENT;
- return !real_filename.error;
-}
-
static bool
get_full_path_of_dll (const char* str, path_conv &real_filename)
{
@@ -63,38 +41,50 @@ get_full_path_of_dll (const char* str, path_conv &real_filename)
return false; /* Yes. Let caller deal with it. */
}
- tmp_pathbuf tp;
- char *name = tp.c_get ();
+ pathfinder::basenamelist basenames;
- strcpy (name, str); /* Put it somewhere where we can manipulate it. */
+ const char *basename = strrchr (str, '/');
+ basename = basename ? basename + 1 : str;
- char *basename = strrchr (name, '/');
- basename = basename ? basename + 1 : name;
- char *suffix = strrchr (name, '.');
- if (suffix && suffix < basename)
- suffix = NULL;
+ int baselen = str + len - basename;
+ const char *suffix = strrchr (basename, '.');
+ char const * ext = "";
+ int extlen = 0;
/* Is suffix ".so"? */
if (suffix && !strcmp (suffix, ".so"))
{
/* Does the file exist? */
- if (gfpod_helper (name, real_filename))
- return true;
+ basenames.appendv (basename, baselen, NULL);
/* No, replace ".so" with ".dll". */
- strcpy (suffix, ".dll");
+ baselen -= 3;
+ ext = ".dll";
+ extlen = 4;
}
/* Does the filename start with "lib"? */
if (!strncmp (basename, "lib", 3))
{
/* Yes, replace "lib" with "cyg". */
- strncpy (basename, "cyg", 3);
- /* Does the file exist? */
- if (gfpod_helper (name, real_filename))
- return true;
+ basenames.appendv ("cyg", 3, basename+3, baselen-3, ext, extlen, NULL);
/* No, revert back to "lib". */
- strncpy (basename, "lib", 3);
}
- if (gfpod_helper (name, real_filename))
+ basenames.appendv (basename, baselen, ext, extlen, NULL);
+
+ pathfinder finder (basenames);
+
+ if (basename > str)
+ finder.add_searchdir (str, basename - 1 - str);
+ else
+ {
+ /* NOTE: The Windows loader (for linked dlls) does
+ not use the LD_LIBRARY_PATH environment variable. */
+ finder.add_envsearchpath ("LD_LIBRARY_PATH");
+
+ /* Finally we better have some fallback. */
+ finder.add_searchdir ("/usr/lib", -1);
+ }
+
+ if (finder.check_path_access (real_filename))
return true;
/* If nothing worked, create a relative path from the original incoming
@@ -113,6 +103,7 @@ dlopen (const char *name, int flags)
{
void *ret = NULL;
+ debug_printf ("flags %d for %s", flags, name);
if (name == NULL)
{
ret = (void *) GetModuleHandle (NULL); /* handle for the current module */
diff --git a/winsup/cygwin/pathfinder.h b/winsup/cygwin/pathfinder.h
new file mode 100644
index 0000000..3453692
--- /dev/null
+++ b/winsup/cygwin/pathfinder.h
@@ -0,0 +1,112 @@
+/* pathfinder.h: find one of multiple file names in path
+
+ Copyright 2016 Red Hat, Inc.
+
+This file is part of Cygwin.
+
+This software is a copyrighted work licensed under the terms of the
+Cygwin license. Please consult the file "CYGWIN_LICENSE" for
+details. */
+
+#include "vstrlist.h"
+
+#ifdef __cplusplus
+
+class pathfinder
+{
+public:
+ typedef vstrlist basenamelist;
+
+private:
+ pathfinder ();
+ pathfinder (pathfinder const &);
+ pathfinder & operator = (pathfinder const &);
+
+ basenamelist basenames_;
+ size_t basenames_maxlen_;
+
+ typedef vstrlist searchbufferlist;
+ searchbufferlist searchbuffers_;
+
+public:
+ ~pathfinder () {}
+
+ /* We need the basenames to search for first, to allow for optimized
+ memory allocation of each searchpath + basename combination.
+ The incoming list of basenames is emptied (ownership take over). */
+ pathfinder (basenamelist & basenames)
+ : basenames_ ()
+ , basenames_maxlen_ ()
+ , searchbuffers_()
+ {
+ basenames.swap(basenames_);
+
+ for (basenamelist::iterator basename = basenames_.begin ();
+ basename != basenames_.end ();
+ ++ basename)
+ {
+ if (basenames_maxlen_ < basename->stringlength ())
+ basenames_maxlen_ = basename->stringlength ();
+ }
+ }
+
+ void add_searchdir (const char *dir, int dirlen)
+ {
+ if (dirlen < 0)
+ dirlen = strlen (dir);
+
+ if (!dirlen)
+ return;
+
+ /* Search "x/bin:x/lib" for "x/lib" */
+ if (dirlen >=4 && !strncmp (dir + dirlen - 4, "/lib", 4))
+ /* prealloc buffer in searchdir for any basename we will search for */
+ searchbuffers_.appendv (dir, dirlen - 4, "/bin", 4, "/", 1 + basenames_maxlen_, NULL);
+
+ /* prealloc buffer in searchdir for any basename we will search for */
+ searchbuffers_.appendv (dir, dirlen, "/", 1 + basenames_maxlen_, NULL);
+ }
+
+ void add_searchpath (const char *path)
+ {
+ while (path && *path)
+ {
+ const char *next = strchr (path, ':');
+ add_searchdir (path, next ? next - path : -1);
+ path = next ? next + 1 : next;
+ }
+ }
+
+ void add_envsearchpath (const char *envpath)
+ {
+ add_searchpath (getenv (envpath));
+ }
+
+ /* Within each searchdir registered, try each registered basename to
+ find as executable. Returns found dir/basename in real_filename.
+ Returns true when found. */
+ bool check_path_access (path_conv& real_filename)
+ {
+ for (searchbufferlist::iterator dir = searchbuffers_.begin ();
+ dir != searchbuffers_.end ();
+ ++dir)
+ for (basenamelist::iterator name = basenames_.begin ();
+ name != basenames_.end ();
+ ++name)
+ {
+ /* complete the filename path to search for */
+ memcpy (dir->buffer () + dir->stringlength (), name->string (), name->stringlength () + 1);
+ debug_printf ("trying %s", dir->buffer ());
+ real_filename.check (dir->string (), PC_SYM_FOLLOW | PC_POSIX);
+ if (real_filename.exists () && !real_filename.isdir ())
+ {
+ debug_printf (" found %s", dir->buffer ());
+ return true;
+ }
+ }
+ real_filename.error = ENOENT;
+ return !real_filename.error;
+ }
+};
+
+#endif /* __cplusplus */
diff --git a/winsup/cygwin/vstrlist.h b/winsup/cygwin/vstrlist.h
new file mode 100644
index 0000000..ecbdc64
--- /dev/null
+++ b/winsup/cygwin/vstrlist.h
@@ -0,0 +1,253 @@
+/* vstrlist.h: vstrlist
+
+ Copyright 2016 Red Hat, Inc.
+
+This file is part of Cygwin.
+
+This software is a copyrighted work licensed under the terms of the
+Cygwin license. Please consult the file "CYGWIN_LICENSE" for
+details. */
+
+#ifdef __cplusplus
+
+class vstrlist
+{
+public:
+ class member
+ {
+ friend class vstrlist;
+ friend class iterator;
+
+ member * prev_;
+ member * next_;
+ size_t bufferlength_;
+ size_t stringlength_;
+ char buffer_[1]; /* we always have space for a trailing zero */
+
+ /* no copy */
+ member (member const &);
+ member & operator = (member const &);
+
+ /* anchor */
+ void * operator new (size_t class_size)
+ {
+ return cmalloc_abort (HEAP_STR, class_size);
+ }
+
+ /* anchor */
+ member ()
+ : prev_ (this)
+ , next_ (this)
+ , bufferlength_ (0)
+ , stringlength_ (0)
+ , buffer_ ()
+ {}
+
+ /* entry: determine memory size from args */
+ void * operator new (size_t class_size, char const * part0, va_list parts)
+ {
+ char const * part = part0;
+ va_list partsdup;
+ va_copy (partsdup, parts);
+ size_t bufferlength = 0;
+ while (part)
+ {
+ int partlen = va_arg (partsdup, int);
+ if (partlen < 0)
+ partlen = strlen (part);
+ bufferlength += partlen;
+ part = va_arg (partsdup, char const *);
+ }
+ va_end (partsdup);
+
+ return cmalloc_abort (HEAP_STR, class_size + bufferlength);
+ }
+
+ /* entry: instantly insert into list */
+ member (member * before, char const * part0, va_list parts)
+ : prev_ (before->prev_)
+ , next_ (before)
+ , bufferlength_ (0)
+ , stringlength_ ()
+ , buffer_ ()
+ {
+ prev_->next_ = this;
+ next_->prev_ = this;
+
+ char * dest = buffer_;
+ char const * part = part0;
+ va_list partsdup;
+ va_copy (partsdup, parts);
+ while (part)
+ {
+ int partlen = va_arg (partsdup, int);
+ if (partlen < 0)
+ {
+ char * old = dest;
+ dest = stpcpy (old, part);
+ partlen = dest - old;
+ }
+ else
+ dest = stpncpy (dest, part, partlen);
+ bufferlength_ += partlen;
+ part = va_arg (partsdup, const char *);
+ }
+ va_end (partsdup);
+ *dest = (char)0;
+ stringlength_ = dest - buffer_;
+ }
+
+ void operator delete (void * p)
+ {
+ cfree (p);
+ }
+
+ /* remove entry from list */
+ ~member ()
+ {
+ member * next = next_;
+ member * prev = prev_;
+ next->prev_ = prev;
+ prev->next_ = next;
+ prev_ = NULL;
+ next_ = NULL;
+ }
+
+ public:
+ member const * next () const { return next_; }
+ member * next () { return next_; }
+ member const * prev () const { return next_; }
+ member * prev () { return next_; }
+
+ /* always is a readonly string */
+ char const * string () const { return buffer_; }
+ size_t stringlength () const { return stringlength_; }
+
+ /* give write access to the buffer when writeable */
+ char * buffer () { return buffer_; }
+ size_t bufferlength () { return bufferlength_; }
+ };
+
+ class iterator
+ {
+ friend class vstrlist;
+
+ member * current_;
+
+ iterator ();
+
+ iterator (member * current)
+ : current_ (current)
+ {}
+
+ public:
+ iterator (iterator const & rhs)
+ : current_ (rhs.current_)
+ {}
+
+ iterator & operator = (iterator const & rhs)
+ {
+ current_ = rhs.current_;
+ return *this;
+ }
+
+ iterator & operator ++ ()
+ {
+ current_ = current_->next ();
+ return *this;
+ }
+
+ iterator operator ++ (int)
+ {
+ iterator ret (*this);
+ current_ = current_->next ();
+ return ret;
+ }
+
+ iterator & operator -- ()
+ {
+ current_ = current_->prev ();
+ return *this;
+ }
+
+ iterator operator -- (int)
+ {
+ iterator ret (*this);
+ current_ = current_->prev ();
+ return ret;
+ }
+
+ bool operator == (iterator const & rhs) const
+ {
+ return current_ == rhs.current_;
+ }
+
+ bool operator != (iterator const & rhs) const
+ {
+ return current_ != rhs.current_;
+ }
+
+ member const & operator * () const { return *current_; }
+ member & operator * () { return *current_; }
+ member const * operator -> () const { return current_; }
+ member * operator -> () { return current_; }
+
+ void remove ()
+ {
+ member * old = current_;
+ ++ *this;
+ delete old;
+ }
+ };
+
+private:
+ member * anchor_;
+
+ /* no copy */
+ vstrlist (vstrlist const &);
+ vstrlist & operator = (vstrlist const &);
+
+public:
+ iterator begin () { return iterator (anchor_->next ()); }
+ iterator end () { return iterator (anchor_ ); }
+ iterator rbegin () { return iterator (anchor_->prev ()); }
+ iterator rend () { return iterator (anchor_ ); }
+
+ vstrlist ()
+ : anchor_ (new member ())
+ {}
+
+ ~vstrlist ()
+ {
+ for (iterator it = begin (); it != end (); it.remove ());
+ delete anchor_;
+ }
+
+ void swap (vstrlist & that)
+ {
+ member * old = anchor_;
+ anchor_ = that.anchor_;
+ that.anchor_ = old;
+ }
+
+ member * appendv (char const * part0, va_list parts)
+ {
+ return new (part0, parts) member (anchor_, part0, parts);
+ }
+
+ member * appendv (char const * part0, ...)
+ {
+ va_list parts;
+ va_start (parts, part0);
+ member * ret = appendv (part0, parts);
+ va_end (parts);
+ return ret;
+ }
+
+ member * append (char const * part)
+ {
+ return appendv (part, strlen(part));
+ }
+};
+
+#endif /* __cplusplus */
--
2.8.3