conference_footprint

compute the CO2 footprint of an academic conference
git clone https://a3nm.net/git/conference_footprint/
Log | Files | Refs | LICENSE

commit edcbb997c5b2264872c74ce04b266a43766ab3a4
parent 1936730c02e2ebe1d3c8cd4180e304a36b370200
Author: Antoine Amarilli <a3nm@a3nm.net>
Date:   Mon,  3 Oct 2022 19:07:44 +0200

fixes following discussion with thomas

Diffstat:
README.md | 148++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
addnoise.py | 16++++++++++++++++
co2.py | 29++++++++---------------------
compute.py | 37+++++++++++++++++--------------------
run.sh | 4+++-
5 files changed, 127 insertions(+), 107 deletions(-)

diff --git a/README.md b/README.md @@ -6,38 +6,66 @@ an academic conference. It was used to compute the footprint of the [Highlights'22 conference](https://highlights-conference.org/2022/). -## Input data format - -The input data should be provided as a CSV field containing the following -fields: -- Field 1: Name of participant -- Field 2: Institution of participant -- Field 3: 3-letter airport or metropolitan area code of origin (first leg, before the conference) -- Field 4: Transportation means of the first leg: "train", "plane", "bus/coach", or - "other" or "" to mean it is unknown. -- Field 5: 3-letter code of destination (second leg, after the conference) -- Field 6: Transportation means of the second leg -- Field 7: "True" if the participant is extending their stay, i.e., travelling for - scientific reasons other than the conference. For such participants, the - computation will only take the longest of the two legs into account. +## Data collection + +We collected information about the travel plans of participants using a [web +form](https://framaforms.org/highlights-participant-travel-information-1664806487) +([archive](https://web.archive.org/web/20221003161159/https://framaforms.org/highlights-participant-travel-information-1664806487)). +To ensure that everyone filled the form, the link to payment was only given once +the form was completed. + +We manually removed duplicate records and fake data. + +For people who did not fill in the details of their travel, we: + +- assumed that they were coming to/from the institution of their first + affiliation +- when the transportation mechanism was unspecified, we assumed that trips of + <=400km were done by rail and trips of >400km were done by plane, following: + https://github.com/ConferenceCarbonTracker/CarbonFootprintAGU#44-mode-of-transport + +Afterwards, we discarded the name and institution of participants. + +We manually translated the free-form city and country to a +machine-understandable location by searching by hand for the closest +three-letter code (airport or metropolitan area). This step could be automated. + +The result is a CSV file in the following format: + +- Field 1: 3-letter airport or metropolitan area code of origin (first leg, before the conference) +- Field 2: Transportation means of the first leg: "train", "plane", or "bus/coach". +- Field 3: 3-letter code of destination (second leg, after the conference) +- Field 4: Transportation means of the second leg +- Additional fields, e.g., a field indicating if the participant is extending + their stay for scientific reasons other than the conference. ## Running the computation You need python3, standard shell utilities, and `GeodSolve` from Debian package `geographiclib-tools`. -Run `./run.sh FILE CODE LAT LON` where: +Run `./run.sh FILE CODE LAT LON NOISE` where: - FILE is the CSV file above - CODE is the 3-letter code used for local participants (their trips will be ignored) -- LAT and LON are the geographical coordinates +- LAT and LON are the geographical coordinates where the conference is taking + place. +- NOISE is the percentage of random error added to the distance (e.g., 0.2 for + 20%) The script will generate: - map.geojson: a Geojson file displaying the various points of travel with color describing whether they are by plane or not. This can be plotted, e.g., with [uMap](http://umap.openstreetmap.fr/fr/). +- `anonymized_participants`, a comma-separated list of participants with headers + and with the following fields: + - Field 1: mode of first leg (as above) + - Field 2: distance of first leg in meters, with random error added + - Field 3: mode of second leg (as above) + - Field 4: distance of second leg in meters, with random error added + - All additional fields in the input are left as-is. - `trips_with_footprint`, a comma-separated list of trips with the following fields: - Field 1: name (note that commas are dropped from names) @@ -48,65 +76,22 @@ The script will generate: - It will also output some aggregate values on the standard error output, and prepare temporary files `trips` and `trips_with_dist` -## Highlights'22 methodology +## Footprint computation methodology -### Registration form data collection +### Local participants -The Highlights registration form asked particiants: - -- "To estimate the carbon footprint of this edition of Highlights, please give - us some information about your travel" -- "Arriving from": city and country, free text -- "Arriving by": other / plane / train / bus or coach / car / local transportation (for locals) -- ditto for departure -- Extended stays: we asked whether: - - They participated to a co-located conference - - They participated to an extended stay support scheme - - They were "extending their stay for scientific reasons by another way" - -The fields were optional, but almost everyone filled them. - -### Processing and completing the registration form information - -We took the registration data and manually removed obviously fake submissions -and apparent duplicates. - -We ignored local participants, for which we estimate a CO2 footprint of 0. - -For people who did not fill in the details of their travel, we: - -- assumed that they were coming to/from the institution of their first - affiliation -- when the transportation mechanism was unspecified, we assumed that trips of - <=400km were done by rail and trips of >400km were done by plane, following: - https://github.com/ConferenceCarbonTracker/CarbonFootprintAGU#44-mode-of-transport - -This gives us a list of trips: each participant has 2 trips, each trip has an origin and -destination (one of them conference venue) and a transportation mode. +We ignore local participants, for which we estimate a CO2 footprint of 0. ### Geocoding and distance computation -We manually translated the free-form city and country to a -machine-understandable location by searching by hand for the closest -three-letter code (airport or metropolitan area). We used the OpenFlights + We used the OpenFlights database airport-extended.dat on [this page](https://openflights.org/data.html) to convert these to geographical coordinates, and used known geographic coordinates for the conference venue. We used GeodSolve to compute the distance of each trip. -### Adjusting for other scientific reasons - -For participants whose stay had other scientific justifications (no matter -which), we counted only the longest of the two trips. The effect is basically to -halve their emissions by considering that the conference carries half the -responsibility. The reason why we do this instead of dividing the total by two -is to make sure that we account for one of the "long trips" required between -their institution and conference venue: indeed, some participants gave details -of these long trips, whereas other gave details of one long trip and one trip to -a neighboring place, e.g., for an extended stay. +### Carbon footprint -### Footprint computation - -Given this list of trips, we then compute their CO2 fotprint following the +We compute the CO2 fotprint following the [labos1point5](https://labos1point5.org/ges-1point5) data, which is adapted from the French agency [Ademe](https://www.ademe.fr/). @@ -129,3 +114,36 @@ the French agency [Ademe](https://www.ademe.fr/). We then sum the total emissions to arrive at the final value. +## Highlights'22 methodology + +### Data collection + +The [Highlights registration +form](https://framaforms.org/highlights2022-on-site-registration-1652701135) +([archive](https://web.archive.org/web/20220622164245/https://framaforms.org/highlights2022-on-site-registration-1652701135)) +asked particiants: + +- "To estimate the carbon footprint of this edition of Highlights, please give + us some information about your travel" +- "Arriving from": city and country, free text +- "Arriving by": other / plane / train / bus or coach / car / local transportation (for locals) +- ditto for departure +- Extended stays: we asked whether: + - They participated to a co-located conference + - They participated to an extended stay support scheme + - They were "extending their stay for scientific reasons by another way" + +The fields were optional, but almost everyone filled them. + +### Adjusting for other scientific reasons + +In the carbon footprint, to account for participants whose stay had other +scientific justifications (no matter which), we counted only the longest of the +two trips. The effect is basically to halve their emissions by considering that +the conference carries half the responsibility. The reason why we do this +instead of dividing the total by two is to make sure that we account for one of +the "long trips" required between their institution and conference venue: +indeed, some participants gave details of these long trips, whereas other gave +details of one long trip and one trip to a neighboring place, e.g., for an +extended stay. + diff --git a/addnoise.py b/addnoise.py @@ -0,0 +1,16 @@ +#!/usr/bin/env python3 + +import sys +from random import uniform + +noise = float(sys.argv[1]) + +print( "mode,distance in meters") + +for l in sys.stdin.readlines(): + f = l.strip().split(',') + mode = f[0] + dist = float(f[3]) + dist_anon = round(uniform(dist * (1-noise), dist * (1+noise))) + print(','.join((mode, str(dist_anon)))) + diff --git a/co2.py b/co2.py @@ -8,8 +8,6 @@ import json import sys from collections import defaultdict -seen = set() - places = defaultdict(lambda: (0, 0, None)) n_trips = 0 @@ -22,23 +20,11 @@ co2_by_type = defaultdict(lambda : 0) for l in sys.stdin.readlines(): f = l.strip().split(',') - person = f[0] - inst = f[1] - mode = f[2] - multitrip = f[3] == "True" - lat = f[4] - lon = f[5] - - if multitrip and person in seen: - # for a multi-purpose trip, only count the first transport leg of that - # person - # we assume that the input is sorted by decreasing distance so that it's - # the longest leg - continue - - seen.add(person) + mode = f[0] + lat = f[1] + lon = f[2] - distance = float(f[6]) + distance = float(f[3]) if mode.strip() not in ['plane', 'train', 'bus/coach']: if distance > 400000: @@ -52,7 +38,7 @@ for l in sys.stdin.readlines(): k = (lat,lon) plane = mode == "plane" - places[k] = (places[k][0] + (1 if plane else 0), places[k][1] + 1, inst) + places[k] = (places[k][0] + (1 if plane else 0), places[k][1] + 1) dist_by_type[mode] += distance n_trips += 1 @@ -72,7 +58,7 @@ for l in sys.stdin.readlines(): co2 = (g_km_person * (distance / 1000.))/1000. co2_by_type[mode] += co2 total_co2 += co2 - print (','.join([person, inst, str(distance), mode, str(co2)])) + print (','.join([str(distance), mode, str(co2)])) ## OUTPUT GEOJSON @@ -86,7 +72,8 @@ for k in places.keys(): feature = { "type": "Feature", "properties": { - "name":places[k][2], "_umap_options": {"color": color} + #"name":places[k][2], + "_umap_options": {"color": color} }, "geometry": { "type": "Point", diff --git a/compute.py b/compute.py @@ -27,28 +27,25 @@ n_extend = 0 with open(datafile, 'r') as f: csvreader = csv.reader(f) for row in csvreader: - name = row[0].replace(',', '') - institution = row[1].replace(',', '') - fromcode = row[2] - frommode = row[3] - tocode = row[4] - tomode = row[5] - extend = row[6] == "True" - added = False + fromcode = row[0] + frommode = row[1] + tocode = row[2] + tomode = row[3] n_participants += 1 - if extend: - n_extend += 1 + if fromcode == localcode: + assert (frommode in ["local", "other", ""]) + assert (tomode in ["local", "other", ""]) + assert (tocode == localcode) + continue + if frommode == "": + frommode = "other" + if tomode == "": + tomode = "other" + assert (frommode in ["bus/coach", "plane", "train", "other"]) + assert (tomode in ["bus/coach", "plane", "train", "other"]) + n_nonlocals += 1 for (mode, code) in [(frommode, fromcode), (tomode, tocode)]: - if code == localcode: - continue - added = True - n_nonlocal_trips += 1 - print (','.join((name, institution, mode, str(extend), - airports[code][0], airports[code][1]))) - if added: - n_nonlocals += 1 + print (','.join((mode, airports[code][0], airports[code][1]))) print("%d total participants" % n_participants, file=sys.stderr) print("%d nonlocal participants" % n_nonlocals, file=sys.stderr) -print("%d nonlocal trips" % n_nonlocal_trips, file=sys.stderr) -print("%d extending" % n_extend, file=sys.stderr) diff --git a/run.sh b/run.sh @@ -7,6 +7,7 @@ FILE="$1" LOCALCODE="$2" LAT="$3" LON="$4" +NOISE="$5" if [ ! -f airports.dat ] then @@ -15,6 +16,7 @@ then fi ./compute.py "$FILE" "$LOCALCODE" > trips -paste -d, trips <(cut -d, -f5,6 trips | tr ',' ' ' | sed "s/^/$LAT $LON /" | GeodSolve -i| cut -d ' ' -f3 ) | sort -t',' -k7,7nr > trips_with_dist +paste -d, trips <(cut -d, -f2,3 trips | tr ',' ' ' | sed "s/^/$LAT $LON /" | GeodSolve -i| cut -d ' ' -f3 ) | sort -t',' -k4,4nr > trips_with_dist +./addnoise.py "$NOISE" < trips_with_dist > trips_anonymized.csv python3 co2.py < trips_with_dist > trips_with_footprint