Back

Scraping LinkedIn Public Profiles

Hi all! I'm trying to understand the ethics around scraping publicly available sources (i.e. not behind a log-in / paywall) to generate information for companies in a database I'm building for my startup. Wikipedia seems to be pretty generous. LinkedIn seems less so, though there was a law suit last year re: HiQ which resulted in the court ordering LinkedIn to allow HiQ to scrape individual users' profiles. All the LinkedIn TOCs indicate that "automated scraping" is not allowed, but I'm wondering if the focus is mostly on internal, private profiles / info. Qa - What is the legality around scraping from publicly available information sources? Q2 - Any experience scraping from public LinkedIn profiles (specifically, company profiles, e.g. year founded, location, paragraph description).
I'm not a lawyer, I think you will probably get the best "real" answer to Q1 from one, but in my understanding the ToS includes "public" data too. Just because it is viewable on the internet doesn't mean it's "public"/free; a user gave consent to LinkedIn to display it, not to you.At a startup I was at, we did a lot of this type of thing, especially LinkedIn crawling. Fundamentally, it's their data and you're at their mercy. We lost a ton of valuable data because they changed things around and we could no longer access certain things - and it made me reexamine and realize we pretty much weren't doing the "above board" thing at all. I would have a solid plan for some kind of above-board partnership longer term if you go down this path. (Which is hard to do if you are in any way competing with them!)
LinkedIn owns their public available data. You should use their API if you want to access their data. Crawling is don't allowed to my knowledge. I was asked to do it for a previous job and refused. We ended up using the alumni information of business schools displayed on LinkedIn.
Hey Katia! To have access to Linkedin's API, as Katerina mentioned, you have to build an app that they have to approve, and they only do so if that app will add value to the whole LinkedIn community, so an internal database will probably not get approved. I have done some crawling on LinkedIn just to play around a bit and it is pretty doable, if you have a pro account you can access Sales Navigator, find the request that returns a JSON with all the company information and save that. You can get current headcount, headcount growth, company location, description, and some other stuff. Concerning legality, it's technically not permitted...